IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
For some certification, it should not be possible to reboot the machine through ctrl-alt-delete. Currently we suggest our customers to mask the ctrl-alt-delete target, but that is obviously not enough.
Patching the keymaps to disable that is really not a way to go for them, because the settings need to be easily checked by some SCAP tools.
Put more emphasis on the routing part. This is the more interesting
thing, and also more complicated and novel.
Explain "search domains" as the special case. Also explain the effect of
~. in more detail.
SYSTEMD_UNIT_PATH=foobar: systemd-analyze verify barbar/unit.service
will load units from barbar/, foobar/, /etc/systemd/system/, etc.
SYSTEMD_UNIT_PATH= systemd-analyze verify barbar/unit.service
will load units only from barbar/, which is useful e.g. when testing
systemd's own units on a system with an older version of systemd installed.
It needs to be possible to tell apart "the nss-resolve module does not exist"
(which can happen when running foreign-architecture programs) from "the queried
DNS name failed DNSSEC validation" or other errors. So return NOTFOUND for these
cases too, and only keep UNAVAIL for the cases where we cannot handle the given
address family.
This makes it possible to configure a fallback to "dns" without breaking
DNSSEC, with "resolve [!UNAVAIL=return] dns". Add this to the manpage.
This does not change behaviour if resolved is not running, as that already
falls back to the "dns" glibc module.
Fixes#4157
DNS servers which have route-only domains should only be used for
the specified domains. Routing queries about other domains there is a privacy
violation, prone to fail (as that DNS server was not meant to be used for other
domains), and puts unnecessary load onto that server.
Introduce a new helper function dns_server_limited_domains() that checks if the
DNS server should only be used for some selected domains, i. e. has some
route-only domains without "~.". Use that when determining whether to query it
in the scope, and when writing resolv.conf.
Extend the test_route_only_dns() case to ensure that the DNS server limited to
~company does not appear in resolv.conf. Add test_route_only_dns_all_domains()
to ensure that a server that also has ~. does appear in resolv.conf as global
name server. These reproduce #3420.
Add a new test_resolved_domain_restricted_dns() test case that verifies that
domain-limited DNS servers are only being used for those domains. This
reproduces #3421.
Clarify what a "routing domain" is in the manpage.
Fixes#3420Fixes#3421
Back when external storage was initially added in 34c10968cb, this mode of
storage was added. This could have made some sense back when XZ compression was
used, and an uncompressed core on disk could be used as short-lived cache file
which does require costly decompression. But now fast LZ4 compression is used
(by default) both internally and externally, so we have duplicated storage,
using the same compression and same default maximum core size in both cases,
but with different expiration lifetimes. Even the uncompressed-external,
compressed-internal mode is not very useful: for small files, decompression
with LZ4 is fast enough not to matter, and for large files, decompression is
still relatively fast, but the disk-usage penalty is very big.
An additional problem with the two modes of storage is that it complicates
the code and makes it much harder to return a useful error message to the user
if we cannot find the core file, since if we cannot find the file we have to
check the internal storage first.
This patch drops "both" storage mode. Effectively this means that if somebody
configured coredump this way, they will get a warning about an unsupported
value for Storage, and the default of "external" will be used.
I'm pretty sure that this mode is very rarely used anyway.
There was no certainty about how the path in service file should look
like for usb functionfs activation. Because of this it was treated
differently in different places, which made this feature unusable.
This patch fixes the path to be the *mount directory* of functionfs, not
ep0 file path and clarifies in the documentation that ListenUSBFunction should be
the location of functionfs mount point, not ep0 file itself.
Make ALSA entries, latency interface, mtrr, apm/acpi, suspend interface,
filesystems configuration and IRQ tuning readonly.
Most of these interfaces now days should be in /sys but they are still
available through /proc, so just protect them. This patch does not touch
/proc/net/...
Let's merge a couple of columns, to make the table a bit shorter. This
effectively just drops whitespace, not contents, but makes the currently
humungous table much much more compact.
This reworks the documentation for ReadOnlyPaths=, ReadWritePaths=,
InaccessiblePaths=. It no longer claims that we'd follow symlinks relative to
the host file system. (Which wasn't true actually, as we didn't follow symlinks
at all in the most recent releases, and we know do follow them, but relative to
RootDirectory=).
This also replaces all references to the fact that all fs namespacing options
can be undone with enough privileges and disable propagation by a single one in
the documentation of ReadOnlyPaths= and friends, and then directs the read to
this in all other places.
Moreover a hint is added to the documentation of SystemCallFilter=, suggesting
usage of ~@mount in case any of the fs namespacing related options are used.
Let's drop the reference to the cap_from_name() function in the documentation
for the capabilities setting, as it is hardly helpful. Our readers are not
necessarily C hackers knowing the semantics of cap_from_name(). Moreover, the
strings we accept are just the plain capability names as listed in
capabilities(7) hence there's really no point in confusing the user with
anything else.
Let's make sure that services that use DynamicUser=1 cannot leave files in the
file system should the system accidentally have a world-writable directory
somewhere.
This effectively ensures that directories need to be whitelisted rather than
blacklisted for access when DynamicUser=1 is set.
Let's tighten our sandbox a bit more: with this change ProtectSystem= gains a
new setting "strict". If set, the entire directory tree of the system is
mounted read-only, but the API file systems /proc, /dev, /sys are excluded
(they may be managed with PrivateDevices= and ProtectKernelTunables=). Also,
/home and /root are excluded as those are left for ProtectHome= to manage.
In this mode, all "real" file systems (i.e. non-API file systems) are mounted
read-only, and specific directories may only be excluded via
ReadWriteDirectories=, thus implementing an effective whitelist instead of
blacklist of writable directories.
While we are at, also add /efi to the list of paths always affected by
ProtectSystem=. This is a follow-up for
b52a109ad3 which added /efi as alternative for
/boot. Our namespacing logic should respect that too.
Also update the description of drop-ins in systemd.unit(5) to say that .d
directories, not .conf files, are in /etc/system/system, /run/systemd/system,
etc.
This splits the OS field in two : one for the distribution name
and one for the the version id.
Dashes are written for missing fields.
This also prints ip addresses of known machines. The `--max-addresses`
option specifies how much ip addresses we want to see. The default is 1.
When more than one address is written for a machine, a `,` follows it.
If there are more ips than `--max-addresses`, `...` follows the last
address.
This adds two (privileged) bus calls Ref() and Unref() to the Unit interface.
The two calls may be used by clients to pin a unit into memory, so that various
runtime properties aren't flushed out by the automatic GC. This is necessary
to permit clients to race-freely acquire runtime results (such as process exit
status/code or accumulated CPU time) on successful service termination.
Ref() and Unref() are fully recursive, hence act like the usual reference
counting concept in C. Taking a reference is a privileged operation, as this
allows pinning units into memory which consumes resources.
Transient units may also gain a reference at the time of creation, via the new
AddRef property (that is only defined for transient units at the time of
creation).
And while ware at it, also drop some references to kdbus, and stop claiming
sd-bus wasn't stable yet. Also order man page references in the main sd-bus man
page alphabetically.
This changes the semantics a bit: before, SYSTEMD_COLORS= would be treated as
"yes", same as SYSTEMD_COLORS=xxx and SYSTEMD_COLORS=1, and only
SYSTEMD_COLORS=0 would be treated as "no". Now, only valid booleans are treated
as "yes". This actually matches how $SYSTEMD_COLORS was announced in NEWS.
The man pages didn't ever mention that symlinks to units can be created, and what
exactly this means. Fix that omission, and disallow presets on alias names.
The parsing functions for [User]TasksMax were inconsistent. Empty string and
"infinity" were interpreted as no limit for TasksMax but not accepted for
UserTasksMax. Update them so that they're consistent with other knobs.
* Empty string indicates the default value.
* "infinity" indicates no limit.
While at it, replace opencoded (uint64_t) -1 with CGROUP_LIMIT_MAX in TasksMax
handling.
v2: Update empty string to indicate the default value as suggested by Zbigniew
Jędrzejewski-Szmek.
v3: Fixed empty UserTasksMax handling.
This adds the boolean RemoveIPC= setting to service, socket, mount and swap
units (i.e. all unit types that may invoke processes). if turned on, and the
unit's user/group is not root, all IPC objects of the user/group are removed
when the service is shut down. The life-cycle of the IPC objects is hence bound
to the unit life-cycle.
This is particularly relevant for units with dynamic users, as it is essential
that no objects owned by the dynamic users survive the service exiting. In
fact, this patch adds code to imply RemoveIPC= if DynamicUser= is set.
In order to communicate the UID/GID of an executed process back to PID 1 this
adds a new "user lookup" socket pair, that is inherited into the forked
processes, and closed before the exec(). This is needed since we cannot do NSS
from PID 1 due to deadlock risks, However need to know the used UID/GID in
order to clean up IPC owned by it if the unit shuts down.
This adds "systemd-mount" which is for transient mount and automount units what
"systemd-run" is for transient service, scope and timer units.
The tool allows establishing mounts and automounts during runtime. It is very
similar to the usual /bin/mount commands, but can pull in additional
dependenices on access (for example, it pulls in fsck automatically), an take
benefit of the automount logic.
This tool is particularly useful for mount removable file systems (such as USB
sticks), as the automount logic (together with automatic unmount-on-idle), as
well as automatic fsck on first access ensure that the removable file system
has a high chance to remain in a fully clean state even when it is unplugged
abruptly, and returns to a clean state on the next re-plug.
This is a follow-up for #2471, as it adds a simple client-side for the
transient automount logic added in that PR.
In later work it might make sense to invoke this tool automatically from udev
rules in order to implement a simpler and safer version of removable media
management á la udisks.
core: add cgroup CPU controller support on the unified hierarchy
(zj: merging not squashing to make it clear against which upstream this patch was developed.)
It is useful to look at a (possibly inactive) container or other os tree
with --root=/path/to/container. This is similar to specifying
--directory=/path/to/container/var/log/journal --directory=/path/to/container/run/systemd/journal
(if using --directory multiple times was allowed), but doesn't require
as much typing.
Unfortunately, due to the disagreements in the kernel development community,
CPU controller cgroup v2 support has not been merged and enabling it requires
applying two small out-of-tree kernel patches. The situation is explained in
the following documentation.
https://git.kernel.org/cgit/linux/kernel/git/tj/cgroup.git/tree/Documentation/cgroup-v2-cpu.txt?h=cgroup-v2-cpu
While it isn't clear what will happen with CPU controller cgroup v2 support,
there are critical features which are possible only on cgroup v2 such as
buffered write control making cgroup v2 essential for a lot of workloads. This
commit implements systemd CPU controller support on the unified hierarchy so
that users who choose to deploy CPU controller cgroup v2 support can easily
take advantage of it.
On the unified hierarchy, "cpu.weight" knob replaces "cpu.shares" and "cpu.max"
replaces "cpu.cfs_period_us" and "cpu.cfs_quota_us". [Startup]CPUWeight config
options are added with the usual compat translation. CPU quota settings remain
unchanged and apply to both legacy and unified hierarchies.
v2: - Error in man page corrected.
- CPU config application in cgroup_context_apply() refactored.
- CPU accounting now works on unified hierarchy.
Let's extend nss-systemd to also synthesize user/group entries for the
UIDs/GIDs 0 and 65534 which have special kernel meaning. Given that nss-systemd
is listed in /etc/nsswitch.conf only very late any explicit listing in
/etc/passwd or /etc/group takes precedence.
This functionality is useful in minimal container-like setups that lack
/etc/passwd files (or only have incompletely populated ones).
This should simplify monitoring tools for services, by passing the most basic
information about service result/exit information via environment variables,
thus making it unnecessary to retrieve them explicitly via the bus.
This new output mode formats all timestamps using the usual format_timestamp()
call we use pretty much everywhere else. Timestamps formatted this way are some
ways more useful than traditional syslog timestamps as they include weekday,
month and timezone information, while not being much longer. They are also not
locale-dependent. The primary advantage however is that they may be passed
directly to journalctl's --since= and --until= switches as soon as #3869 is
merged.
While we are at it, let's also add "short-unix" to shell completion.
This patch improves parsing and generation of timestamps and calendar
specifications in two ways:
- The week day is now always printed in the abbreviated English form, instead
of the locale's setting. This makes sure we can always parse the week day
again, even if the locale is changed. Given that we don't follow locale
settings for printing timestamps in any other way either (for example, we
always use 24h syntax in order to make uniform parsing possible), it only
makes sense to also stick to a generic, non-localized form for the timestamp,
too.
- When parsing a timestamp, the local timezone (in its DST or non-DST name)
may be specified, in addition to "UTC". Other timezones are still not
supported however (not because we wouldn't want to, but mostly because libc
offers no nice API for that). In itself this brings no new features, however
it ensures that any locally formatted timestamp's timezone is also parsable
again.
These two changes ensure that the output of format_timestamp() may always be
passed to parse_timestamp() and results in the original input. The related
flavours for usec/UTC also work accordingly. Calendar specifications are
extended in a similar way.
The man page is updated accordingly, in particular this removes the claim that
timestamps systemd prints wouldn't be parsable by systemd. They are now.
The man page previously showed invalid timestamps as examples. This has been
removed, as the man page shouldn't be a unit test, where such negative examples
would be useful. The man page also no longer mentions the names of internal
functions, such as format_timestamp_us() or UNIX error codes such as EINVAL.
This setting adds minimal user namespacing support to a service. When set the invoked
processes will run in their own user namespace. Only a trivial mapping will be
set up: the root user/group is mapped to root, and the user/group of the
service will be mapped to itself, everything else is mapped to nobody.
If this setting is used the service runs with no capabilities on the host, but
configurable capabilities within the service.
This setting is particularly useful in conjunction with RootDirectory= as the
need to synchronize /etc/passwd and /etc/group between the host and the service
OS tree is reduced, as only three UID/GIDs need to match: root, nobody and the
user of the service itself. But even outside the RootDirectory= case this
setting is useful to substantially reduce the attack surface of a service.
Example command to test this:
systemd-run -p PrivateUsers=1 -p User=foobar -t /bin/sh
This runs a shell as user "foobar". When typing "ps" only processes owned by
"root", by "foobar", and by "nobody" should be visible.
This removes the --share-system switch: from the documentation, the --help text
as well as the command line parsing. It's an ugly option, given that it kinda
contradicts the whole concept of PID namespaces that nspawn implements. Since
it's barely ever used, let's just deprecate it and remove it from the options.
It might be useful as a debugging option, hence the functionality is kept
around for now, exposed via an undocumented $SYSTEMD_NSPAWN_SHARE_SYSTEM
environment variable.
This complements graphical-session.target for services which set up the
environment (e. g. dbus-update-activation-environment) and need to run before
the actual graphical session.
They were outdated, and this way it's less likely that they'll get out of sync
again. Anyway, it's easier for the reader to have the kernel and config file
options next to one another.
In this mode, messages from processes which are not part of the session
land in the main journal file, and only output of processes which are
properly part of the session land in the user's journal. This is
confusing, in particular because systemd-coredump runs outside of the
login session.
"Deprecate" SplitMode=login by removing it from documentation, to
discourage people from using it.
This unit acts as a dynamic "alias" target for any concrete graphical user
session like gnome-session.target; these should declare
"BindsTo=graphical-session.target" so that both targets stop and start at the
same time.
This allows services that run in a particular graphical user session (e. g.
gnome-settings-daemon.service) to declare "PartOf=graphical-session.target"
without having to know or get updated for all/new session types. This will
ensure that stopping the graphical session will stop all services which are
associated to it.
As suggested by @mbiebl we already use the "!" special char in unit file
assignments for negation, hence we should not use it in a different context for
privileged execution. Let's use "+" instead.
To "search something", in the meaning of looking for it, is valid,
but "search _for_ something" is much more commonly used, especially when
the meaning could be confused with "looking _through_ something"
(for some other object).
(C.f. "the police search a person", "the police search for a person".)
Also reword the rest of the paragraph to avoid using "automatically"
three times.
Not as many people use chroot as before, so make the flow a bit nicer by
talking less about chroot.
"change to the either" is awkward and unclear. Just remove that part,
because all changes are lost, period.
Clarify that "systemctl enable" can operate either on unit names or on unit
file paths (also, adjust the --help text to clarify this). Say that "systemctl
enable" on unit file paths also links the unit into the search path.
Many other fixes.
This should improve the documentation to avoid further confusion around #3706.
Let's not mention the supposed security benefit of turning off caching. It is
really questionnable, and I#d rather not create the impression that we actually
believed turning off caching would be a good idea.
Instead, mention that Cache=no is implicit if a DNS server on the local host is
used.
This adds a new boolean setting DynamicUser= to service files. If set, a new
user will be allocated dynamically when the unit is started, and released when
it is stopped. The user ID is allocated from the range 61184..65519. The user
will not be added to /etc/passwd (but an NSS module to be added later should
make it show up in getent passwd).
For now, care should be taken that the service writes no files to disk, since
this might result in files owned by UIDs that might get assigned dynamically to
a different service later on. Later patches will tighten sandboxing in order to
ensure that this cannot happen, except for a few selected directories.
A simple way to test this is:
systemd-run -p DynamicUser=1 /bin/sleep 99999
As it turns out 512 is max number of tasks per service is hit by too many
applications, hence let's bump it a bit, and make it relative to the system's
maximum number of PIDs. With this change the new default is 15%. At the
kernel's default pids_max value of 32768 this translates to 4915. At machined's
default TasksMax= setting of 16384 this translates to 2457.
Why 15%? Because it sounds like a round number and is close enough to 4096
which I was going for, i.e. an eight-fold increase over the old 512
Summary:
| on the host | in a container
old default | 512 | 512
new default | 4915 | 2457
Let's change from a fixed value of 12288 tasks per user to a relative value of
33%, which with the kernel's default of 32768 translates to 10813. This is a
slight decrease of the limit, for no other reason than "33%" sounding like a nice
round number that is close enough to 12288 (which would translate to 37.5%).
(Well, it also has the nice effect of still leaving a bit of room in the PID
space if there are 3 cooperating evil users that try to consume all PIDs...
Also, I like my bikesheds blue).
Since the new value is taken relative, and machined's TasksMax= setting
defaults to 16384, 33% inside of containers is usually equivalent to 5406,
which should still be ample space.
To summarize:
| on the host | in the container
old default | 12288 | 12288
new default | 10813 | 5406
This adds support for a TasksMax=40% syntax for specifying values relative to
the system's configured maximum number of processes. This is useful in order to
neatly subdivide the available room for tasks within containers.
This rearranges bootctl a bit, so that it uses the usual verbs parsing
routines, and automatically searches the ESP in /boot, /efi or /boot/efi, thus
increasing compatibility with mainstream distros that insist on /boot/efi.
This also adds minimal support for running bootctl in a container environment:
when run inside a container verification of the ESP via raw block device
access, trusting the container manager to mount the ESP correctly. Moreover,
EFI variables are not accessed when running in the container.
Let's make the EFI generator a bit smarter: if /efi exists it is used as mount
point for the ESP, otherwise /boot is used. This should increase compatibility
with distros which use legacy boot loaders that insist on having /boot as
something that isn't the ESP.
If the ESP is not mounted with "iocharset=ascii", but with "iocharset=utf8"
(which is for example the default in Debian), the file system becomes case
sensitive. This means that a file created as "FooBarBaz" cannot be accessed as
"foobarbaz" since those are then considered different files.
Moreover, a file created as "FooBar" can then also not be accessed as "foobar",
and it also prevents such a file from being created, as both would use the same
8.3 short name "FOOBAR".
Even though the UEFI specification [0] does give the canonical spelling for
the files mentioned above, not all implementations completely conform to that,
so it's possible that those files would already exist, but with a different
spelling, causing subtle bugs when scanning or modifying the ESP.
While the proper fix would of course be that everybody conformed to the
standard, we can work around this problem by just referencing the files by
their 8.3 short names, i.e. using upper case.
Fixes: #3740
[0] <http://www.uefi.org/specifications>, version 2.6, section 3.5.1.1
* Specifying a device node has an effect much larger than a simple shortcut
for a field/value match, so the original sentence is no longer a good way
to start the paragraph.
* Specifying a device node causes matches to be generated for all ancestor
devices of the device specified, not just its parents.
* Indicates that the path must be absolute, but that it may be a link.
* Eliminates a few typos.
This patch renames Read{Write,Only}Directories= and InaccessibleDirectories=
to Read{Write,Only}Paths= and InaccessiblePaths=, previous names are kept
as aliases but they are not advertised in the documentation.
Renamed variables:
`read_write_dirs` --> `read_write_paths`
`read_only_dirs` --> `read_only_paths`
`inaccessible_dirs` --> `inaccessible_paths`
Despite the name, `Read{Write,Only}Directories=` already allows for
regular file paths to be masked. This commit adds the same behavior
to `InaccessibleDirectories=` and makes it explicit in the doc.
This patch introduces `/run/systemd/inaccessible/{reg,dir,chr,blk,fifo,sock}`
{dile,device}nodes and mounts on the appropriate one the paths specified
in `InacessibleDirectories=`.
Based on Luca's patch from https://github.com/systemd/systemd/pull/3327
The distinction between systemd-shutdown the binary vs system-shutdown
the hook directory (without the 'd') is not immediately obvious and can
be quite confusing if you are looking for a directory which doesn't exist.
Therefore explicitly mention the hook directory in the synopsis with a
trailing slash to make it clearer which is which.
systemd.special.xml: corrections about implicit
dependencies for basic.target, sysinit.target and shutdown.target.
systemd.target.xml: corrections about implicit dependencies for
target units in general.
This extends the existing event loop iteration counter to 64bit, and exposes it
via a new function sd_event_get_iteration(). This is helpful for cases like
issue #3612. After all, since we maintain the counter anyway, we might as well
expose it.
(This also fixes an unrelated issue in the man page for sd_event_wait() where
micro and milliseconds got mixed up)
Type=notify has a magic overriding case where a NotifyAccess=none
is turned into a NotifyAccess=main for sanity purposes.
This makes docs more clear about such behavior:
2787d83c28/src/core/service.c (L650):L651
My educated guess is that #3561 was filed due to confusion around the
systemd-resolve "Data Authenticated:" output. Let's try to clean up the
confusion a bit, and document what it means in the man page.
In some cases, caching DNS results locally is not desirable, a it makes DNS
cache poisoning attacks a tad easier and also allows users on the system to
determine whether or not a particular domain got visited by another user. Thus
provide a new "Cache" resolved.conf option to disable it.
This change documents the existance of the systemd-nspawn@.service template
unit file, which was previously not mentioned at all. Since the unit file uses
slightly different default than nspawn invoked from the command line, these
defaults are now explicitly documented too.
A couple of further additions and changes are made, too.
Replaces: #3497
Add sd_notify() parameter to change watchdog_usec during runtime.
Application can change watchdog_usec value by
sd_notify like this. Example. sd_notify(0, "WATCHDOG_USEC=20000000").
To reset watchdog_usec as configured value in service file,
restart service.
Notice.
sd_event is not currently supported. If application uses
sd_event_set_watchdog, or sd_watchdog_enabled, do not use
"WATCHDOG_USEC" option through sd_notify.
The new command shows the per-link and global DNS configuration currently in
effect. This is useful to quickly see the DNS settings resolved acquired from
networkd and that was pushed into it via the bus APIs.
In makefile we create symlinks runlevel5.target to graphical.target and
runlevel2-4.target to multi-user.target. Let's say the same thing in
systemd.special manpage.
This permits services to detect whether their stdout/stderr is connected to the
journal, and if so talk to the journal directly, thus permitting carrying of
metadata.
As requested by the gtk folks: #2473
If a percentage is used, it is taken relative to the installed RAM size. This
should make it easier to write generic unit files that adapt to the local system.
This adds three new seccomp syscall groups: @keyring for kernel keyring access,
@cpu-emulation for CPU emulation features, for exampe vm86() for dosemu and
suchlike, and @debug for ptrace() and related calls.
Also, the @clock group is updated with more syscalls that alter the system
clock. capset() is added to @privileged, and pciconfig_iobase() is added to
@raw-io.
Finally, @obsolete is a cleaned up. A number of syscalls that never existed on
Linux and have no number assigned on any architecture are removed, as they only
exist in the man pages and other operating sytems, but not in code at all.
create_module() is moved from @module to @obsolete, as it is an obsolete system
call. mem_getpolicy() is removed from the @obsolete list, as it is not
obsolete, but simply a NUMA API.
This patch implements the new magic character '!'. By putting '!' in front
of a command, systemd executes it with full privileges ignoring paramters
such as User, Group, SupplementaryGroups, CapabilityBoundingSet,
AmbientCapabilities, SecureBits, SystemCallFilter, SELinuxContext,
AppArmorProfile, SmackProcessLabel, and RestrictAddressFamilies.
Fixes partially https://github.com/systemd/systemd/issues/3414
Related to https://github.com/coreos/rkt/issues/2482
Testing:
1. Create a user 'bob'
2. Create the unit file /etc/systemd/system/exec-perm.service
(You can use the example below)
3. sudo systemctl start ext-perm.service
4. Verify that the commands starting with '!' were not executed as bob,
4.1 Looking to the output of ls -l /tmp/exec-perm
4.2 Each file contains the result of the id command.
`````````````````````````````````````````````````````````````````
[Unit]
Description=ext-perm
[Service]
Type=oneshot
TimeoutStartSec=0
User=bob
ExecStartPre=!/usr/bin/sh -c "/usr/bin/rm /tmp/exec-perm*" ;
/usr/bin/sh -c "/usr/bin/id > /tmp/exec-perm-start-pre"
ExecStart=/usr/bin/sh -c "/usr/bin/id > /tmp/exec-perm-start" ;
!/usr/bin/sh -c "/usr/bin/id > /tmp/exec-perm-star-2"
ExecStartPost=/usr/bin/sh -c "/usr/bin/id > /tmp/exec-perm-start-post"
ExecReload=/usr/bin/sh -c "/usr/bin/id > /tmp/exec-perm-reload"
ExecStop=!/usr/bin/sh -c "/usr/bin/id > /tmp/exec-perm-stop"
ExecStopPost=/usr/bin/sh -c "/usr/bin/id > /tmp/exec-perm-stop-post"
[Install]
WantedBy=multi-user.target]
`````````````````````````````````````````````````````````````````
This the patch implements a notificaiton mechanism from the init process
in the container to systemd-nspawn.
The switch --notify-ready=yes configures systemd-nspawn to wait the "READY=1"
message from the init process in the container to send its own to systemd.
--notify-ready=no is equivalent to the previous behavior before this patch,
systemd-nspawn notifies systemd with a "READY=1" message when the container is
created. This notificaiton mechanism uses socket file with path relative to the contanier
"/run/systemd/nspawn/notify". The default values it --notify-ready=no.
It is also possible to configure this mechanism from the .nspawn files using
NotifyReady. This parameter takes the same options of the command line switch.
Before this patch, systemd-nspawn notifies "ready" after the inner child was created,
regardless the status of the service running inside it. Now, with --notify-ready=yes,
systemd-nspawn notifies when the service is ready. This is really useful when
there are dependencies between different contaniers.
Fixes https://github.com/systemd/systemd/issues/1369
Based on the work from https://github.com/systemd/systemd/pull/3022
Testing:
Boot a OS inside a container with systemd-nspawn.
Note: modify the commands accordingly with your filesystem.
1. Create a filesystem where you can boot an OS.
2. sudo systemd-nspawn -D ${HOME}/distros/fedora-23/ sh
2.1. Create the unit file /etc/systemd/system/sleep.service inside the container
(You can use the example below)
2.2. systemdctl enable sleep
2.3 exit
3. sudo systemd-run --service-type=notify --unit=notify-test
${HOME}/systemd/systemd-nspawn --notify-ready=yes
-D ${HOME}/distros/fedora-23/ -b
4. In a different shell run "systemctl status notify-test"
When using --notify-ready=yes the service status is "activating" for 20 seconds
before being set to "active (running)". Instead, using --notify-ready=no
the service status is marked "active (running)" quickly, without waiting for
the 20 seconds.
This patch was also test with --private-users=yes, you can test it just adding it
at the end of the command at point 3.
------ sleep.service ------
[Unit]
Description=sleep
After=network.target
[Service]
Type=oneshot
ExecStart=/bin/sleep 20
[Install]
WantedBy=multi-user.target
------------ end ------------
The long name is just too hard to type. We generally should avoid using
acronyms too liberally, if they aren't established enough, but it appears that
"RA" is known well enough. Internally we call the option "ipv6_accept_ra"
anyway, and the kernel also exposes it under this name. Hence, let's rename the
IPv6AcceptRouterAdvertisements= setting and the
[IPv6AcceptRouterAdvertisements] section to IPv6AcceptRA= and [IPv6AcceptRA].
The old setting IPv6AcceptRouterAdvertisements= is kept for compatibility with
older configuration. (However the section [IPv6AcceptRouterAdvertisements] is
not, as it was never available in a published version of systemd.
Debian and their derivatives (Ubuntu, Trisquel, etc.) use a code name
for their repositories. Thus record the code name in os-release for
processing.
Closessystemd/systemd#3429
This reworks sd-ndisc and networkd substantially to support IPv6 RA much more
comprehensively. Since the API is extended quite a bit networkd has been ported
over too, and the patch is not as straight-forward as one could wish. The
rework includes:
- Support for DNSSL, RDNSS and RA routing options in sd-ndisc and networkd. Two
new configuration options have been added to networkd to make this
configurable.
- sd-ndisc now exposes an sd_ndisc_router object that encapsulates a full RA
message, and has direct, friendly acessor functions for the singleton RA
properties, as well as an iterative interface to iterate through known and
unsupported options. The router object may either be retrieved from the wire,
or generated from raw data. In many ways the sd-ndisc API now matches the
sd-lldp API, except that no implicit database of seen data is kept. (Note
that sd-ndisc actually had a half-written, but unused implementaiton of such
a store, which is removed now.)
- sd-ndisc will now collect the reception timestamps of RA, which is useful to
make sd_ndisc_router fully descriptive of what it covers.
Fixes: #1079
New exec boolean MemoryDenyWriteExecute, when set, installs
a seccomp filter to reject mmap(2) with PAGE_WRITE|PAGE_EXEC
and mprotect(2) with PAGE_EXEC.
Recently added cgroup unified hierarchy support uses "max" in configurations
for no upper limit. While consistent with what the kernel uses for no upper
limit, it is inconsistent with what systemd uses for other controllers such as
memory or pids. There's no point in introducing another term. Update cgroup
unified hierarchy support so that "infinity" is the only term that systemd
uses for no upper limit.