1
0
mirror of https://github.com/systemd/systemd.git synced 2025-01-03 05:18:09 +03:00

Compare commits

...

108 Commits

Author SHA1 Message Date
Lennart Poettering
79c3916335
Merge a8f6814be8 into ccaa76ac48 2024-12-20 21:29:02 +00:00
Lennart Poettering
a8f6814be8 update TODO 2024-12-20 22:17:06 +01:00
Lennart Poettering
1fa2c9ed26 test: test comprehensive tests for new (and old) nspawn userns modes 2024-12-20 22:17:06 +01:00
Lennart Poettering
327daea5e9 man: document new nspawn functionality around unpriv support 2024-12-20 22:17:06 +01:00
Lennart Poettering
29df5667c3 nspawn: add support for 'managed' userns mode even when we run privileged 2024-12-20 22:17:06 +01:00
Lennart Poettering
9eef1fe103 nspawn: drop duplicate logic for moving to an alternative root to operate on
We actually already have this logic in run(), that under
various conditions uses a different root fs to operate on. No need to do
this twice.

(Or to say this in stronger words: this is dead code, because run()
aleady moved the root fs to something else if it sees "/" being used.
2024-12-20 22:17:06 +01:00
Lennart Poettering
26a48aee01 nspawn: close root mount fd when we don't need it anymore 2024-12-20 22:17:06 +01:00
Lennart Poettering
f48026f74a nspawn: drop some redundant {} 2024-12-20 22:17:06 +01:00
Lennart Poettering
7fb9144e65 nspawn: improve log messages a bit 2024-12-20 22:17:06 +01:00
Lennart Poettering
a440d138a5 nspawn: support foreign mappings also when nspawn doing the mapping itself 2024-12-20 22:17:06 +01:00
Lennart Poettering
5f1ac9e3c0 cgroup: when we fail to clean up a cgroup, let's ask PID 1 for help 2024-12-20 22:17:06 +01:00
Lennart Poettering
dd0b3a9215 pid1: add D-Bus API for removing delegated subcgroups
When running unprivileged containers, we run into a scenario where an
unpriv owned cgroup has a subcgroup delegated to another user (i.e. the
container's own UIDs). When the owner of that cgroup dies without
cleaning it up then the unpriv service manager might encounter a cgroup
it cannot delete anymore.

Let's address that: let's expose a method call on the service manager
(primarly in PID1) that can be used to delete a subcgroup of a unit one
owns. This would then allow the unpriv service manager to ask the priv
service manager to get rid of such a cgroup.

This commit only adds the method call, the next commit then adds the
code that makes use of this.
2024-12-20 22:17:06 +01:00
Lennart Poettering
6bfbdd3110 pid1: allow moving processes in a userns owned by the user, too 2024-12-20 22:17:06 +01:00
Lennart Poettering
02ab91301c namespace-util: add process_is_owned_by_uid() helper 2024-12-20 22:17:06 +01:00
Lennart Poettering
a23142a361 namespace-util: add helper to get base UID from userns 2024-12-20 22:17:06 +01:00
Lennart Poettering
fedc35489b dissect: employ vpick also if we operate on a directory-based image 2024-12-20 22:17:06 +01:00
Lennart Poettering
655265526a dissect: add a bit of color to --discover table 2024-12-20 22:17:06 +01:00
Lennart Poettering
90b730eacc dissect: show all kinds of images in --discover
Given that systemd-dissect can nowadays operate on plain directories,
let's include directory images in the --discover output too.

Replace the filter with a filter for hidden images instead, as suddenly
the root fs image (which is a directory image ".host") otherwise shows up.
2024-12-20 22:17:05 +01:00
Lennart Poettering
fd7266383a nspawn: allow to run unpriv from dir 2024-12-20 22:15:21 +01:00
Lennart Poettering
7149009417 dissect: add new --shift command 2024-12-20 22:15:18 +01:00
Lennart Poettering
f979247bb9 nspawn: move uid shift/chown() code into shared/ 2024-12-20 22:13:05 +01:00
Lennart Poettering
1703d200f8 dissect-image: add client side API wrapper for MountDirectory() varlink call 2024-12-20 22:13:05 +01:00
Lennart Poettering
6e50124e24 mntfsd: add api to mount dirs for containers
Note that we have to drop various sandboxing knobs from the mountfsd
service file for this to work, since the kernel's security checks that
try to ensure than an obstructued /proc/ cannot be circumvented via
mounting a new procfs will otherwise prohibit mountfsd to duplicate the
mounts properly.
2024-12-20 22:13:05 +01:00
Lennart Poettering
a0b9d4d296 userdb: synthesize stub user records for the foreign UID 2024-12-20 22:13:05 +01:00
Lennart Poettering
19634b0c69 user-classification: add new "foreign" UID range
This makes the UID range configurable via build time options, but of
course it really shouldn't be changed. The default range I picked is
outside even of IPAs current (ridiculously large) allocation ranges,
hence hopefully minimizes conflicts.
2024-12-20 22:13:05 +01:00
Lennart Poettering
796d87aa73 dissect: minor simplifications 2024-12-20 22:13:05 +01:00
Lennart Poettering
4ee940f5e2 dissect-image: rename ReplyParameters → MountImageReplyParameters 2024-12-20 22:13:05 +01:00
Lennart Poettering
c448f9c015 nsresourced: add ability to mangle specified name if necessary
Let's optionally mangle any passed name on the server side so that it is
useful for identifying a userns, if it isn't suitable for that
right-away. This mostl means truncating it if too long.

It's just too nasty to leave this to the client side, since they'd have
to understand the precise rules for naming userns then.

While we are at it, add full Varlink IDL comments.
2024-12-20 22:13:05 +01:00
Lennart Poettering
86895e29c8 nspawn: rework userns_mkdir() around chase() 2024-12-20 22:13:05 +01:00
Lennart Poettering
56ffb40ad4 fs-util: fail xopenat() when called with O_EXCL but without file name with EEXIST 2024-12-20 22:13:05 +01:00
Lennart Poettering
674f29d402 fs-util: teach xopenat_full() to pick automatically if given as MODE_INVALID 2024-12-20 22:13:05 +01:00
Lennart Poettering
ccaa76ac48
image-discovery: add per-user scope (#35510) 2024-12-20 22:12:35 +01:00
Lennart Poettering
2232038187
pid1: complete per-user credentials support (#35536)
Fixes: #33887 #33796 #33318
2024-12-20 22:12:08 +01:00
Lennart Poettering
1563404159
analyze: extend CHID support to more types (#35699)
Let's implement the spec more comprehensively.

This is piece by piece work, There's more to do on the EFI side before
all CHID types are supported, but in userspace it should be reasonably
complete now.
2024-12-20 22:11:39 +01:00
Daan De Meyer
2138278d25
Various mkosi improvements (#35684) 2024-12-20 21:24:51 +01:00
Daan De Meyer
34b5a27b0b docs: Simplify hacking documentation
Let's use "mkosi sandbox" in the docs so that users can build systemd
without having to install anything except mkosi. Using mkosi sandbox
will use tools and dependencies from the tools tree which is also used
in CI and thus has a higher chance of working from the first try compared
to whatever tools might be installed on the host system of a new contributor.
2024-12-20 20:09:36 +01:00
Daan De Meyer
ba3f148307 mkosi.clangd: Fail on command errors 2024-12-20 20:09:36 +01:00
Daan De Meyer
b133f57544 mkosi.clangd: Don't pass --host if we're not using flatpak-spawn 2024-12-20 20:09:36 +01:00
Daan De Meyer
8c5b4df543 mkosi: Use build/ as extra search path by default
Building systemd with mkosi generally requires a very recent version
of systemd which might not be installed on the host. Let's configure
mkosi to look for extra executables in the build/ directory by default
so that we prefer systemd executables from the build directory over those
on the host as those on the host are likely to be too old.
2024-12-20 20:09:36 +01:00
Daan De Meyer
1995084a9e mkosi: Use tools tree by default
Let's enable usage of a tools tree by default to simplify the setup
for new contributors and save them from having to install or upgrade
a bunch of extra tools to get mkosi working as expected.
2024-12-20 20:09:35 +01:00
Daan De Meyer
ac1a711d9a mkosi: Enable EPEL for CentOS Stream tools tree
We need packages from EPEL to be able to build CentOS Stream images
with a CentOS Stream tools tree so enable it. This is broken on CentOS
Stream 10 but given using a CentOS Stream tools tree is broken without
EPEL as well, we might as well enable it and just wait until the packages
are added to EPEL 10.
2024-12-20 20:09:35 +01:00
Daan De Meyer
d4dda34854 mkosi: Add libz1 to opensuse tools tree
Without meson fails to configure properly.
2024-12-20 20:09:35 +01:00
Daan De Meyer
7337f4b197 mkosi: Add gdb to tools tree 2024-12-20 20:09:35 +01:00
Daan De Meyer
3ee5cab490 docs: Move fuzzers documentation to test README.md 2024-12-20 20:09:35 +01:00
Daan De Meyer
3add2d73b3 coverage: Run on pull request in a few cases
If we're changing the integration test wrapper or coverage.yml, let's
run the coverage workflow on PRs as well to make sure it doesn't break.
2024-12-20 20:09:35 +01:00
Daan De Meyer
1dd345b00d mkosi: Update to latest 2024-12-20 20:09:35 +01:00
Lennart Poettering
8ca50bde48 analyze-chid: fully support all CHID types
This adds logic to read the missing SMBIOS fields from userspace, too.
With this we should have full CHID coverage now, matching fwupd's output
fully.
2024-12-20 18:13:18 +01:00
Lennart Poettering
0eb51d9913 analyze-chid: split out code that reads smbios into helper 2024-12-20 18:13:18 +01:00
Lennart Poettering
6b99f3ba5a analyze: C escape weird chars in SMBIOS fields
just in case, let's not write garbled crap to the TTY but escape and
potential weird chars before output.
2024-12-20 18:13:18 +01:00
Lennart Poettering
95cd07e772 chid: add missing CHID type definitions
This add he missing CHID types to our tables, but doesn't add all
necessary code to calculate them yet.

This brings us closer to what the CHID spec documents, and what
"fupwdtool hwids" outputs.
2024-12-20 18:13:18 +01:00
Lennart Poettering
0f55038c84 analyze-chid: show friendly smbios field names
Some of the field names between kernel and smbios spec differ. Kinda
confusing. Let's use the smbios field names, to match the CHID spec,
which also uses them, and thus be least confusing, treating kernel
attribute fields as an internal Linux thing only.
2024-12-20 18:13:18 +01:00
Lennart Poettering
37e02b455b analyze: not all smbios fields are always defined, deal with that
As per previous commit, accept that not all SMBIOS fields are alwaysa
available (or set, but empty), hence handle this gracefully and don't
generate relevant CHIDs, as per docs.
2024-12-20 18:13:18 +01:00
Lennart Poettering
a04af8516e chid-fundamental: rework bit checking to use FLAGS_SET() 2024-12-20 18:13:16 +01:00
Lennart Poettering
094e2ace12 chid-fundamental: use right type to iterate through smbios fields 2024-12-20 18:06:34 +01:00
Lennart Poettering
f8988a5e45 chid-fundamental: make namespace GUID static, too 2024-12-20 18:06:34 +01:00
Lennart Poettering
d1bbfaeba5 chid-fundamental: not all SMBIOS fields are available on all systems
And the CHID documentation says that CHIDs that require fields that are
not available on the local system should not be generated. Follow that,
and generate a NULL CHID in that case (which we generally ignore
otherwise).
2024-12-20 18:06:34 +01:00
Lennart Poettering
2b717a7f14 update TODO 2024-12-20 18:04:01 +01:00
Lennart Poettering
1c0ade2e1f discover-image: introduce per-user image directories
We nowadays support unprivileged invocation of systemd-nspawn +
systemd-vmspawn, but there was no support for discovering suitable disk
images (i.e. no per-user counterpart of /var/lib/machines). Add this
now, and hook it up everywhere.

Instead of hardcoding machined's, importd's, portabled's, sysupdated's
image discovery to RUNTIME_SCOPE_SYSTEM I introduced a field that make
the scope variable, even if this field is always initialized to
RUNTIME_SCOPE_SYSTEM for now. I think these four services should
eventually be updated to support a per-user concept too, this is
preparation for that, even though it doesn't outright add support for
this.

This is for the largest part not user visible, except for in nspawn,
vmspawn and the dissect tool. For the latter I added a pair of
--user/--system switches to select the discovery scope.
2024-12-20 18:04:01 +01:00
Lennart Poettering
8cbcdc78db update TODO 2024-12-20 17:52:09 +01:00
Lennart Poettering
4103bf9f2f man: document the new per-use credstore paths
(And some other minor tweaks)
2024-12-20 17:52:07 +01:00
Lennart Poettering
026dfd60d4 test: add integration test that makes sure unpriv creds work correctly
This checks both the per-user credstore directory logic, and that
unprivileged, encrypted credentials work.
2024-12-20 17:52:04 +01:00
Lennart Poettering
1af989e8de pid1: add support for decrypting per-user credentials
When I added support for unprivileged credentials I apparently never
hooked them up to service management correctly. Let's fix that.

Fixes: #33796 #33318
2024-12-20 17:52:01 +01:00
Lennart Poettering
8506a9955c execute: introduce a user-scoped credstore
Fixes: #33887
2024-12-20 17:51:58 +01:00
Lennart Poettering
d2cd189324 sd-path: expose credential store in sd-path 2024-12-20 17:51:54 +01:00
Lennart Poettering
b226b7fb6d systemd-path: add the usual ANSI sequences to --help text 2024-12-20 17:51:52 +01:00
Lennart Poettering
060e2512cd systemd-path: guarantee that tool exit status is zero on success
Let's not inherit the error code from an earlier function invocation.
2024-12-20 17:51:50 +01:00
Lennart Poettering
81082f2dc2 systemd-path: order all listed paths by their ID alphabetically
Let's add some system to the madness, given we added user-specific dirs
to the end of the list, but they should really be listed together with
the other user-specific ones.
2024-12-20 17:51:48 +01:00
Lennart Poettering
616586b910 sd-path: don't chop off trailing slash in sd_path apis, when user provided them
This is a minor compat break, but given the slow adoption of the
sd-path.h APIs I think it's one we should take. Basically, the idea is
that if the user provides a suffix path with a trailing slash (thus
encoding in the path that the last element must be a dir), we should
keep it in place, and not suppress it, in order to not willy nilly
reduce the amount of information contained in the path.

Simplifications that do not alter meaning, and do not suppress
information should be fine to apply to a path, but otherwise we really
should be conservative on this.
2024-12-20 17:51:46 +01:00
Lennart Poettering
cf7d0a2d2e pid1: normalize oom error handling a bit 2024-12-20 17:51:42 +01:00
Ricky Tigg
06ffa66a5b po: Translated using Weblate (Finnish)
Currently translated at 100.0% (257 of 257 strings)

Co-authored-by: Ricky Tigg <ricky.tigg@gmail.com>
Translate-URL: https://translate.fedoraproject.org/projects/systemd/main/fi/
Translation: systemd/main
2024-12-21 00:51:18 +09:00
Septatrix
33bfa69b2e Add .venv to gitignore
This directory is commonly used for virtual Python environments.
These are useful when developing to install different Python versions
as well as install tooling like mkosi and mypy in an isolated fashion
without influencing the global system.
2024-12-20 15:33:32 +00:00
Lennart Poettering
f108996319
core/device: handle ID_PROCESSING udev property (#35351)
Continuation of #35332.
2024-12-20 10:12:39 +01:00
Daan De Meyer
dec47e58a6
debug-generator: add a kernel cmdline option to pause the boot process (#35410)
Introduce the `systemd.break=` kernel command line option to allow
stopping the boot process at a certain point and spawn a debug shell.
After exiting this shell, the system will resume booting.

It accepts the following values:
- `pre-udev`: before starting to process kernel uevents (initrd and
host).
- `pre-basic`: before leaving early boot and regular services start
(initrd and host).
- `pre-mount`: before the root filesystem is mounted (initrd).
- `pre-switch-root`: before switching root (initrd).
2024-12-20 10:04:41 +01:00
Lennart Poettering
cdcb1eeeb8
[RFC] better naming for Azure MANA network devices (#34255)
The Azure MANA folks would like the PCI domain to be suppressed from
naming network interfaces. Let's introduce a somewhat generic way to do
this, without hardcoding anything to Azure.

Specifically: we'll ship a new hwdb entry that sets a new
ID_NET_NAME_INCLUDE_DOMAIN=0 property on relevant MANA devices. Then we
make net_id look for that property, and if it is set we simply suppress
the PCI domain.

(Untested as of now, needs feedback from Azure MANA folks that this
actually works and does what is requested here).
2024-12-20 09:52:40 +01:00
Matteo Croce
77d4a263c1 mkosi: move config options
Move some config option in the right section, fixes the following warning:
```
mkosi.conf: Setting Credentials should be configured in [Runtime], not [Host].
mkosi.conf: Setting RuntimeBuildSources should be configured in [Runtime], not [Host].
mkosi.conf: Setting RuntimeScratch should be configured in [Runtime], not [Host].
mkosi.conf: Setting QemuSmp should be configured in [Runtime], not [Host].
mkosi.conf: Setting QemuSwtpm should be configured in [Runtime], not [Host].
mkosi.conf: Setting QemuVsock should be configured in [Runtime], not [Host].
mkosi.conf: Setting QemuKvm should be configured in [Runtime], not [Host].
```
2024-12-20 09:38:11 +01:00
Yu Watanabe
5f29c86ace audit-util: rename output parameter
To make them consistent with in audit-util.c.

Follow-up for 7e02ee98d8.
2024-12-20 09:37:25 +01:00
Yu Watanabe
182ffb5819 TEST-71-HOSTNAME: do not start user session
The user session may trigger hostnamed, and the job of stopping
hostnamed may be cancelled, and the test may fail:
```
[ 4633.613578] TEST-71-HOSTNAME.sh[175]: + stop_hostnamed
[ 4633.613578] TEST-71-HOSTNAME.sh[175]: + systemctl stop systemd-hostnamed.service
[ 4633.664670] systemd[1]: Stopping systemd-hostnamed.service - Hostname Service...
[ 4636.022277] systemd-logind[121]: New session c2 of user root.
[ 4636.032532] systemd[1]: Created slice user-0.slice - User Slice of UID 0.
[ 4636.042675] systemd[1]: Starting user-runtime-dir@0.service - User Runtime Directory /run/user/0...
[ 4636.176140] systemd[1]: Finished user-runtime-dir@0.service - User Runtime Directory /run/user/0.
[ 4636.202951] systemd[1]: Starting user@0.service - User Manager for UID 0...
[ 4636.292204] systemd-logind[121]: New session c3 of user root.
[ 4636.300065] (systemd)[268]: pam_unix(systemd-user:session): session opened for user root(uid=0) by root(uid=0)
[ 4636.757667] systemd[268]: Queued start job for default target default.target.
[ 4636.774419] systemd[268]: Created slice app.slice - User Application Slice.
[ 4636.774579] systemd[268]: Started systemd-tmpfiles-clean.timer - Daily Cleanup of User's Temporary Directories.
[ 4636.774747] systemd[268]: Reached target paths.target - Paths.
[ 4636.776418] systemd[268]: Reached target sysinit.target - System Initialization.
[ 4636.776604] systemd[268]: Reached target timers.target - Timers.
[ 4636.784997] systemd[268]: Starting dbus.socket - D-Bus User Message Bus Socket...
[ 4636.799472] systemd[268]: Starting systemd-tmpfiles-setup.service - Create User Files and Directories...
[ 4637.027125] systemd[268]: Finished systemd-tmpfiles-setup.service - Create User Files and Directories.
[ 4637.031721] systemd[268]: Listening on dbus.socket - D-Bus User Message Bus Socket.
[ 4637.036189] systemd[268]: Reached target sockets.target - Sockets.
[ 4637.036373] systemd[268]: Reached target basic.target - Basic System.
[ 4637.036558] systemd[268]: Reached target default.target - Main User Target.
[ 4637.036646] systemd[268]: Startup finished in 702ms.
[ 4637.049075] systemd[1]: Started user@0.service - User Manager for UID 0.
[ 4637.075263] systemd[1]: Started session-c2.scope - Session c2 of User root.
[ 4637.084917] login[136]: pam_unix(login:session): session opened for user root(uid=0) by root(uid=0)
[ 4637.117348] login[136]: ROOT LOGIN ON pts/0
[ 4637.238572] systemctl[261]: Job for systemd-hostnamed.service canceled.
[ 4637.290369] systemd[1]: TEST-71-HOSTNAME.service: Main process exited, code=exited, status=1/FAILURE
```

Fixes #35643.
2024-12-20 09:36:51 +01:00
Antonio Alvarez Feijoo
e9f781a5a4
debug-generator: add a kernel cmdline option to pause the boot process
Introduce the `systemd.break=` kernel command line option to allow stopping the
boot process at a certain point and spawn a debug shell. After exiting this
shell, the system will resume booting.

It accepts the following values:
- `pre-udev`: before starting to process kernel uevents (initrd and host).
- `pre-basic`: before leaving early boot and regular services start (initrd and
host).
- `pre-mount`: before the root filesystem is mounted (initrd).
- `pre-switch-root`: before switching root (initrd).
2024-12-20 08:51:23 +01:00
Antonio Alvarez Feijoo
cb3801a4c9
man/debug-generator: add a section for kernel command line options 2024-12-20 08:48:23 +01:00
Yu Watanabe
8a135111ca
capability-util: generalize helper to acquire local caps (#35403)
This generalizes and modernizes the code to acquire set of local caps,
based on the code for this in the condition logic. Uses PidRef, and
acquires the full quintuplet of caps.

This can be considered preparation to one day maybe build without
libcap.
2024-12-20 11:52:24 +09:00
Yu Watanabe
5e837858e7
analyze: add "chid" verb to display CHIDs of the local system (#35175)
We already have the code for it, expose it in systemd-analyze, because
it's useful.
2024-12-20 11:47:03 +09:00
Yu Watanabe
a3fecea5e2
Small fixes to nspawn and other stuff (#35686)
Split out ouf #35685.
2024-12-20 11:03:59 +09:00
Yu Watanabe
f01132aacf TEST-17: add test case for ID_PROCESSING flag on add uevent
Also, check the state of the device units on change event.
2024-12-20 10:52:57 +09:00
Yu Watanabe
ad920b4cb3 core/device: handle ID_PROCESSING udev property
If an enumerated device has ID_PROCESSING=1 property, and the service
manager does not know if the device has been processed by udevd
previously (that is, Device.deserialized_found does not have
DEVICE_FOUND_UDEV), then drop DEVICE_FOUND_UDEV flag from the device and
make the device not enter the active state.

Follow-up for 405be62f05, which was
reverted by c4fc22c4de.
2024-12-20 10:52:57 +09:00
Yu Watanabe
a7396f8364 core/device: use path_equal() to compare sysfs path
The hashmap Manager.devices_by_sysfs uses path_hash_ops.
Let's consistent compare function.
2024-12-20 10:52:57 +09:00
Yu Watanabe
71ec342d13 core/device: rename output parameters of device_setup_units() to ret_xyz
No functional change, just refactoring.
2024-12-20 10:52:57 +09:00
Yu Watanabe
3b9010b170
udev: support reloading udev.conf (#35458)
This makes systemd-udevd reload udev.conf when explicitly requested by
e.g. `udevadm control --reload`.
2024-12-20 09:00:48 +09:00
Yu Watanabe
cf89e48028 ptyfwd: reset writable/readable flag before shovel() on exit
Follow-up for 12807b5a49.

Otherwise, if a call of shovel() disabled the flags, the subsequent
calls do nothing even if there is something we need to read or write.

Fixes the following error:
```
Dec 19 02:19:39 run0[5618]: Error on PTY forwarding logic: Too many levels of symbolic links
```
2024-12-20 08:59:41 +09:00
Lennart Poettering
5ceb38cb1e nspawn: switch to read_virtual_file() for reading audit loginuid 2024-12-19 15:37:00 +01:00
Lennart Poettering
312cf91005 nsresource: print nicer error message when trying to acquire an unpriv user ns range where this isn't possible 2024-12-19 15:35:23 +01:00
Lennart Poettering
b1b128d0e2 mount-util: add debug message to make_userns() failure 2024-12-19 15:33:52 +01:00
Lennart Poettering
91cdc8ab0f mount-util: add debug output when we switched root 2024-12-19 15:33:44 +01:00
Lennart Poettering
009a02b263 nspawn: trivial improvements 2024-12-19 15:33:34 +01:00
Lennart Poettering
b83358b87f nspawn: rename pin_fully_visible_fs() → pin_fully_visible_api_fs()
This function pins the *API* FS, i.e. /proc/ + /sys/, not just any fs.
Hence clarify this in the name.

(At least we call these two fs "API (V)FS" in our codebase, hence
continue to do so here)
2024-12-19 15:33:24 +01:00
Lennart Poettering
bf1ef54d30 nspawn: add some additional useful debug logging 2024-12-19 15:33:10 +01:00
Lennart Poettering
8f9ea89ce4 nspawn: make unexpected mkdir() failures fatal
THis is just to make things easier to debug.
2024-12-19 15:33:06 +01:00
Lennart Poettering
5229cd839a nspawn: rename 'fd' variable to something more descriptive 2024-12-19 15:32:50 +01:00
Lennart Poettering
595ca10f37 nspawn: use DEVNUM_FORMAT_STR/DEVNUM_FORMAT_VAL more 2024-12-19 15:32:05 +01:00
Yu Watanabe
ced0ef3b35 TEST-17: use 'udevadm control --reload' or 'systemctl reload systemd-udevd.service' for reloading udev.conf
These should be equivalent. For coverage, one subtest uses systemctl and
another uses udevadm.
2024-12-19 19:00:38 +09:00
Yu Watanabe
e95861d909 udev: also reload udev.conf when explicitly requested
When reloading is explicitly requested, e.g. by 'udevadm control --reload',
then also reload udev.conf.
2024-12-19 18:57:32 +09:00
Yu Watanabe
0f72af536f udev: reload .rules files and builtins only when necessary
Previously, even if e.g. .rules files are unchanged, all .rules files
are reloaded when other kind of config files like .link files or
.hwdb.bin are changed, vice versa.
2024-12-19 18:55:18 +09:00
Lennart Poettering
0e5a83f510 analyze: drop conditioning of --no-legend and --json= on specific verbs
First of all, the list of verbs was badly out of date, in particular for
--no-legend. But second if it, I think such minor switches that alter
some detail of the output should not result in failure when the specific
tweak does not apply on some command. It should be fine for scripts and
suchlike to dumbly always pass --no-legend to all invocations of our
tools without having to consider if a specific subtool of ours actually
supports it or not.
2024-12-18 17:39:22 +01:00
Lennart Poettering
8f114904fc analyze: add verb for showing system's CHIDs
We have the code already, expose it in systemd-analyze too.

This should make it easier to debug the CHID use in the UKIs with
onboard tooling.
2024-12-18 17:38:42 +01:00
Lennart Poettering
25b1a73f71 journald: get rid of get_process_capeff(), use pidref_get_capability() instead
This does pretty much the same, but is nicer, since it parses things
properly.
2024-12-17 19:06:54 +01:00
Lennart Poettering
a5370d35d6 capability-util: introduce capability_is_set() helper 2024-12-17 19:06:54 +01:00
Lennart Poettering
1184626a26 capability-util: generalize helper to acquire local caps
This generalizes and modernizes the code to acquire set of local caps,
based on the code for this in the condition logic. Uses PidRef, and
acquires the full quintuplet of caps.

This can be considered preparation to one day maybe build without
libcap.
2024-12-17 19:06:54 +01:00
Lennart Poettering
9311c28b34 hwdb: disable inclusion of the PCI domain in MANA network interface naming 2024-12-11 16:38:25 +01:00
Lennart Poettering
19491cc90f net_id: depending on new udev prop, include/exclude PCI domain from netif names 2024-12-11 16:37:59 +01:00
150 changed files with 4058 additions and 954 deletions

View File

@ -7,6 +7,14 @@ on:
# Calculate coverage daily at midnight
- cron: '0 0 * * *'
pull_request:
branches:
- main
- v[0-9]+-stable
paths:
- .github/workflows/coverage.yml
- test/integration-test-wrapper.py
permissions:
contents: read
@ -16,7 +24,7 @@ jobs:
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683
- uses: systemd/mkosi@07ef37c4c0dad5dfc6cec86c967a7600df1cd88c
- uses: systemd/mkosi@ba07d53000b6c560ad0b9f07550aca93c0284e88
# Freeing up disk space with rm -rf can take multiple minutes. Since we don't need the extra free space
# immediately, we remove the files in the background. However, we first move them to a different location
@ -49,7 +57,6 @@ jobs:
Distribution=arch
[Build]
ToolsTree=default
ToolsTreeDistribution=arch
UseSubvolumes=yes
WithTests=no
@ -64,7 +71,7 @@ jobs:
MESON_OPTIONS=--werror
COVERAGE=1
[Host]
[Runtime]
QemuMem=4G
EOF

View File

@ -113,7 +113,7 @@ jobs:
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683
- uses: systemd/mkosi@c4bbf3b71a3e2cf947995caedf10f69da3c4957a
- uses: systemd/mkosi@ba07d53000b6c560ad0b9f07550aca93c0284e88
# Freeing up disk space with rm -rf can take multiple minutes. Since we don't need the extra free space
# immediately, we remove the files in the background. However, we first move them to a different location
@ -152,7 +152,6 @@ jobs:
[Build]
UseSubvolumes=yes
ToolsTree=default
ToolsTreeDistribution=fedora
ToolsTreeRelease=rawhide
@ -171,7 +170,7 @@ jobs:
[Content]
SELinuxRelabel=${{ matrix.relabel }}
[Host]
[Runtime]
QemuMem=4G
EOF

1
.gitignore vendored
View File

@ -7,6 +7,7 @@
.config.args
.gdb_history
.deps/
.venv/
.mypy_cache/
__pycache__/
/*.gcda

28
TODO
View File

@ -122,6 +122,18 @@ Deprecations and removals:
Features:
* importd: introduce a per-user instance, that downloads into per-user DDI dirs
* sysupdated: similar
* portabled: similar
* machined: implement a per-user instance, that manages per-user DDI dirs for
images. systemd-nspawn/systemd-vmspawn should probably register with both the
system and the user scoped machined instance. The former to get the machine
name registered as hostname, and the latter so that the image stuff is nicely
per-user managed.
* resolved: make resolved process DNR DHCP info
* Teach systemd-ssh-generator to generated an /run/issue.d/ drop-in telling
@ -138,6 +150,16 @@ Features:
* start using STATX_SUBVOL in btrfs_is_subvol(). Also, make use of it
generically, so that image discovery recognizes bcachefs subvols too.
* "systemd-export tar" should reuse the libarchive export code from systemd-dissect
--archive.
* "systemd-import tar" should be moved to libarchive
* foreign uid:
- add support to export-fs, import-fs, import-tar, export-tar
- add tool for deleting foreign UID held container images
- systemd-dissect should learn mappings, too, when doing mtree and such
* format-table: introduce new cell type for strings with ansi sequences in
them. display them in regular output mode (via strip_tab_ansi()), but
suppress them in json mode.
@ -391,8 +413,6 @@ Features:
the bg via vmspawn/nspawn if not done so yet and then requests a shell inside
it for the invoking user.
* importd/…: define per-user dirs for container/VM images too.
* add a new specifier to unit files that figures out the DDI the unit file is
from, tracing through overlayfs, DM, loopback block device.
@ -446,10 +466,6 @@ Features:
* credentials: add a flag to the scoped credentials that if set require PK
reauthentication when unlocking a secret.
* teach systemd --user to properly load credentials off disk, with
/etc/credstore equivalent and similar. Make sure that $CREDENTIALS_DIRECTORY=
actually works too when run with user privs.
* extend the smbios11 logic for passing credentials so that instead of passing
the credential data literally it can also just reference an AF_VSOCK CID/port
to read them from. This way the data doesn't remain in the SMBIOS blob during

View File

@ -7,94 +7,97 @@ SPDX-License-Identifier: LGPL-2.1-or-later
# Hacking on systemd
We welcome all contributions to systemd.
If you notice a bug or a missing feature, please feel invited to fix it, and submit your work as a
We welcome all contributions to systemd. If you notice a bug or a missing
feature, please feel invited to fix it, and submit your work as a
[GitHub Pull Request (PR)](https://github.com/systemd/systemd/pull/new).
Please make sure to follow our [Coding Style](/CODING_STYLE) when submitting patches.
Also have a look at our [Contribution Guidelines](/CONTRIBUTING).
Please make sure to follow our [Coding Style](/CODING_STYLE) when submitting
patches. Also have a look at our [Contribution Guidelines](/CONTRIBUTING).
When adding new functionality, tests should be added.
For shared functionality (in `src/basic/` and `src/shared/`) unit tests should be sufficient.
The general policy is to keep tests in matching files underneath `src/test/`,
e.g. `src/test/test-path-util.c` contains tests for any functions in `src/basic/path-util.c`.
If adding a new source file, consider adding a matching test executable.
For features at a higher level, tests in `src/test/` are very strongly recommended.
If that is not possible, integration tests in `test/` are encouraged.
When adding new functionality, tests should be added. For shared functionality
(in `src/basic/` and `src/shared/`) unit tests should be sufficient. The general
policy is to keep tests in matching files underneath `src/test/`, e.g.
`src/test/test-path-util.c` contains tests for any functions in
`src/basic/path-util.c`. If adding a new source file, consider adding a matching
test executable. For features at a higher level, tests in `src/test/` are very
strongly recommended. If that is not possible, integration tests in `test/` are
encouraged. Please always test your work before submitting a PR.
Please always test your work before submitting a PR.
For many of the components of systemd testing is straightforward as you can simply compile systemd and run the relevant tool from the build directory.
## Hacking on systemd with mkosi
For some components (most importantly, systemd/PID 1 itself) this is not possible, however.
In order to simplify testing for cases like this we provide a set of `mkosi` config files directly in the source tree.
[mkosi](https://mkosi.systemd.io/)
is a tool for building clean OS images from an upstream distribution in combination with a fresh build of the project in the local working directory.
To make use of this, please install `mkosi` from the [GitHub repository](https://github.com/systemd/mkosi#running-mkosi-from-the-repository).
`mkosi` will build an image for the host distro by default.
First, run `mkosi genkey` to generate a key and certificate to be used for secure boot and verity signing.
After that is done, it is sufficient to type `mkosi` in the systemd project directory to generate a disk image you can boot either in `systemd-nspawn` or in a UEFI-capable VM:
[mkosi](https://mkosi.systemd.io/) is our swiss army knife for hacking on
systemd. It makes sure all necessary dependencies are available to build systemd
and allows building and booting an OS image with the latest systemd installed
for testing purposes.
First, install `mkosi` from the
[GitHub repository](https://github.com/systemd/mkosi#running-mkosi-from-the-repository).
Note that it's not possible to use your distribution's packaged version of mkosi
as mkosi has to be installed outside of `/usr` for the following steps to work.
Then, you can build and run systemd executables as follows:
```sh
$ sudo mkosi boot # nspawn still needs sudo for now
$ mkosi -f sandbox meson setup build
$ mkosi -f sandbox ninja -C build
$ mkosi -f sandbox build/systemctl --version
```
or:
To build and boot an OS image with the latest systemd installed:
```sh
$ mkosi qemu
$ mkosi -f genkey # Generate signing keys once.
$ mkosi -f sandbox ninja -C build mkosi # (re-)build the OS image
$ sudo mkosi boot # Boot the image with systemd-nspawn.
$ mkosi qemu # Boot the image with qemu.
```
By default, the tools from your host system are used to build the image.
Sometimes we start using mkosi features that rely on functionality in systemd
tools that's not in an official release yet. In that case, you'll need to build
systemd from source on the host and configure mkosi to use the tools from the
systemd build directory.
To do a local build, most distributions provide very simple and convenient ways
to install most development packages necessary to build systemd:
Putting this all together, here's a series of commands for preparing a patch for
systemd:
```sh
# Fedora
$ sudo dnf builddep systemd
# Debian/Ubuntu
$ sudo apt-get build-dep systemd
# Arch
$ sudo pacman -S devtools
$ pkgctl repo clone --protocol=https systemd
$ git clone https://github.com/systemd/mkosi.git
$ ln -s $PWD/mkosi/bin/mkosi ~/.local/bin/mkosi # Make sure ~/.local/bin is in $PATH.
$ git clone https://github.com/systemd/systemd.git
$ cd systemd
$ makepkg -seoc
$ git checkout -b <BRANCH> # where BRANCH is the name of the branch
$ $EDITOR src/core/main.c # or wherever you'd like to make your changes
$ mkosi -f sandbox meson setup build # Set up meson
$ mkosi -f genkey # Generate signing keys once.
$ mkosi -f sandbox ninja -C build mkosi # (re-)build the test image
$ mkosi qemu # Boot the image in qemu
$ git add -p # interactively put together your patch
$ git commit # commit it
$ git push -u <REMOTE> # where REMOTE is your "fork" on GitHub
```
After installing the development packages, systemd can be built from source as follows:
And after that, head over to your repo on GitHub and click "Compare & pull
request"
```sh
$ meson setup build <options>
$ ninja -C build
$ meson test -C build
```
Happy hacking!
To have `mkosi` use the systemd tools from the `build/` directory, add the
following to `mkosi.local.conf`:
The following sections contain advanced topics on how to speed up development or
streamline debugging. Feel free to read them if you're interested but they're
not required to write basic patches.
## Building the OS image without a tools tree
By default, `mkosi` will first build a tools tree and use it build the image and
provide the environment for `mkosi sandbox`. To disable the tools tree and use
binaries from your host instead, write the following to `mkosi.local.conf`:
```conf
[Host]
ExtraSearchPaths=build/
[Build]
ToolsTree=
```
And if you want `mkosi` to build a tools image and use the tools from there
instead of looking for tools on the host, add the following to
`mkosi.local.conf`:
## Rebuilding systemd without rebuilding the OS image
```conf
[Host]
ToolsTree=default
```
Every time you rerun the `mkosi` command a fresh image is built, incorporating
all current changes you made to the project tree. To build the latest changes
and re-install after booting the image, run one of the following commands in
another terminal on your host (choose the right one depending on the
distribution of the container or virtual machine):
Every time the `mkosi` target is built, a fresh image is built. To build the
latest changes and re-install systemd without rebuilding the image, run one of
the following commands in another terminal on your host after booting the image
(choose the right one depending on the distribution of the container or virtual
machine):
```sh
mkosi -t none && mkosi ssh dnf upgrade --disablerepo="*" --assumeyes "/work/build/*.rpm" # CentOS/Fedora
@ -107,26 +110,6 @@ and optionally restart the daemon(s) you're working on using
`systemctl restart <units>` or `systemctl daemon-reexec` if you're working on
pid1 or `systemctl soft-reboot` to restart everything.
Putting this all together, here's a series of commands for preparing a patch for systemd:
```sh
$ git clone https://github.com/systemd/mkosi.git
$ ln -s $PWD/mkosi/bin/mkosi /usr/local/bin/mkosi
$ git clone https://github.com/systemd/systemd.git
$ cd systemd
$ git checkout -b <BRANCH> # where BRANCH is the name of the branch
$ vim src/core/main.c # or wherever you'd like to make your changes
$ mkosi -f qemu # (re-)build and boot up the test image in qemu
$ mkosi -t none # Build new packages without rebuilding the image
$ git add -p # interactively put together your patch
$ git commit # commit it
$ git push -u <REMOTE> # where REMOTE is your "fork" on GitHub
```
And after that, head over to your repo on GitHub and click "Compare & pull request"
Happy hacking!
## Building distribution packages with mkosi
To build distribution packages for a specific distribution and release without
@ -201,67 +184,6 @@ Those are not useful when compiling for distribution and can be disabled by sett
See [Testing systemd using sanitizers](/TESTING_WITH_SANITIZERS) for more information on how to build with sanitizers enabled in mkosi.
## Fuzzers
systemd includes fuzzers in `src/fuzz/` that use libFuzzer and are automatically run by [OSS-Fuzz](https://github.com/google/oss-fuzz) with sanitizers.
To add a fuzz target, create a new `src/fuzz/fuzz-foo.c` file with a `LLVMFuzzerTestOneInput` function and add it to the list in `src/fuzz/meson.build`.
Whenever possible, a seed corpus and a dictionary should also be added with new fuzz targets.
The dictionary should be named `src/fuzz/fuzz-foo.dict` and the seed corpus should be built and exported as `$OUT/fuzz-foo_seed_corpus.zip` in `tools/oss-fuzz.sh`.
The fuzzers can be built locally if you have libFuzzer installed by running `tools/oss-fuzz.sh`, or by running:
```sh
CC=clang CXX=clang++ \
meson setup build-libfuzz -Dllvm-fuzz=true -Db_sanitize=address,undefined -Db_lundef=false \
-Dc_args='-fno-omit-frame-pointer -DFUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION'
ninja -C build-libfuzz fuzzers
```
Each fuzzer then can be then run manually together with a directory containing the initial corpus:
```
export UBSAN_OPTIONS=print_stacktrace=1:print_summary=1:halt_on_error=1
build-libfuzz/fuzz-varlink-idl test/fuzz/fuzz-varlink-idl/
```
Note: the `halt_on_error=1` UBSan option is especially important,
otherwise the fuzzer won't crash when undefined behavior is triggered.
You should also confirm that the fuzzers can be built and run using
[the OSS-Fuzz toolchain](https://google.github.io/oss-fuzz/advanced-topics/reproducing/#building-using-docker):
```sh
path_to_systemd=...
git clone --depth=1 https://github.com/google/oss-fuzz
cd oss-fuzz
for sanitizer in address undefined memory; do
for engine in libfuzzer afl honggfuzz; do
./infra/helper.py build_fuzzers --sanitizer "$sanitizer" --engine "$engine" \
--clean systemd "$path_to_systemd"
./infra/helper.py check_build --sanitizer "$sanitizer" --engine "$engine" \
-e ALLOWED_BROKEN_TARGETS_PERCENTAGE=0 systemd
done
done
./infra/helper.py build_fuzzers --clean --architecture i386 systemd "$path_to_systemd"
./infra/helper.py check_build --architecture i386 -e ALLOWED_BROKEN_TARGETS_PERCENTAGE=0 systemd
./infra/helper.py build_fuzzers --clean --sanitizer coverage systemd "$path_to_systemd"
./infra/helper.py coverage --no-corpus-download systemd
```
If you find a bug that impacts the security of systemd,
please follow the guidance in [CONTRIBUTING.md](/CONTRIBUTING) on how to report a security vulnerability.
For more details on building fuzzers and integrating with OSS-Fuzz, visit:
- [Setting up a new project - OSS-Fuzz](https://google.github.io/oss-fuzz/getting-started/new-project-guide/)
- [Tutorials - OSS-Fuzz](https://google.github.io/oss-fuzz/reference/useful-links/#tutorials)
## Debugging binaries that need to run as root in vscode
When trying to debug binaries that need to run as root,

View File

@ -129,10 +129,18 @@ possible.
erroneously considers UIDs signed integers, and hence can't deal with values above 2^31.
The `systemd-machined.service` service will synthesize user database records for all UIDs assigned to a running container from this range.
Note for both allocation ranges: when a UID allocation takes place NSS is
checked for collisions first, and a different UID is picked if an entry is found.
Thus, the user database is used as synchronization mechanism to ensure
exclusive ownership of UIDs and UID ranges.
4. 2147352576…2147418111 → UID range used for foreign OS images. For various
usecases (primarily: containers) it makes sense to make foreign OS images
available locally whose UID/GID ownerships do not make sense in the local
context but only within the OS image itself. This 64K UID range can be used
to have a clearly defined ownership even on the host, that can be mapped via
idmapped mount to a dynamic runtime UID range as needed. (These numbers in
hexadecimal are 0x7FFE0000…0x7FFEFFFF.)
Note for the `DynamicUser=` and the `systemd-nspawn` allocation ranges: when a
UID allocation takes place NSS is checked for collisions first, and a different
UID is picked if an entry is found. Thus, the user database is used as
synchronization mechanism to ensure exclusive ownership of UIDs and UID ranges.
To ensure compatibility with other subsystems allocating from the same ranges it is hence essential that they
ensure that whatever they pick shows up in the user/group databases, either by
providing an NSS module, or by adding entries directly to `/etc/passwd` and `/etc/group`.
@ -157,6 +165,8 @@ $ pkg-config --variable=container_uid_base_min systemd
524288
$ pkg-config --variable=container_uid_base_max systemd
1878982656
$ pkg-config --variable=foreign_uid_base systemd
2147352576
```
(Note that the latter encodes the maximum UID *base* `systemd-nspawn` might
@ -164,7 +174,7 @@ pick — given that 64K UIDs are assigned to each container according to this
allocation logic, the maximum UID used for this range is hence
1878982656+65535=1879048191.)
Systemd has compile-time default for these boundaries.
systemd has compile-time default for these boundaries.
Using those defaults is recommended.
It will nevertheless query `/etc/login.defs` at runtime, when compiled with `-Dcompat-mutable-uid-boundaries=true` and that file is present.
Support for this is considered only a compatibility feature and should not be
@ -244,25 +254,27 @@ i.e. somewhere below `/var/` or similar.
## Summary
| UID/GID | Purpose | Defined By | Listed in |
|-----------------------|-----------------------|---------------|-------------------------------|
| 0 | `root` user | Linux | `/etc/passwd` + `nss-systemd` |
| 1…4 | System users | Distributions | `/etc/passwd` |
| 5 | `tty` group | `systemd` | `/etc/passwd` |
| 6…999 | System users | Distributions | `/etc/passwd` |
| 1000…60000 | Regular users | Distributions | `/etc/passwd` + LDAP/NIS/… |
| 60001…60513 | Human users (homed) | `systemd` | `nss-systemd` |
| 60514…60577 | Host users mapped into containers | `systemd` | `systemd-nspawn` |
| 60578…61183 | Unused | | |
| 61184…65519 | Dynamic service users | `systemd` | `nss-systemd` |
| 65520…65533 | Unused | | |
| 65534 | `nobody` user | Linux | `/etc/passwd` + `nss-systemd` |
| 65535 | 16-bit `(uid_t) -1` | Linux | |
| 65536…524287 | Unused | | |
| 524288…1879048191 | Container UID ranges | `systemd` | `nss-systemd` |
| 1879048192…2147483647 | Unused | | |
| 2147483648…4294967294 | HIC SVNT LEONES | | |
| 4294967295 | 32-bit `(uid_t) -1` | Linux | |
| UID/GID | Same in Hexadecimal | How Many | Purpose | Defined By | Listed in |
|----------------------:|----------------------:|-----------:|:----------------------------------|:--------------|:------------------------------|
| 0 | 0x00000000 | 1 | `root` user | Linux | `/etc/passwd` + `nss-systemd` |
| 1…4 | 0x00000001…0x00000004 | 4 | System users | Distributions | `/etc/passwd` |
| 5 | 0x00000005 | 1 | `tty` group | `systemd` | `/etc/passwd` |
| 6…999 | 0x00000006…0x000003E7 | 994 | System users | Distributions | `/etc/passwd` |
| 1000…60000 | 0x000003E8…0x00001770 | 59000 | Regular users | Distributions | `/etc/passwd` + LDAP/NIS/… |
| 60001…60513 | 0x0000EA61…0x0000EC61 | 513 | Human users (homed) | `systemd` | `nss-systemd` |
| 60514…60577 | 0x0000EC62…0x0000ECA1 | 64 | Host users mapped into containers | `systemd` | `systemd-nspawn` |
| 60578…61183 | 0x0000ECA2…0x0000EEFF | 606 | *unused* | | |
| 61184…65519 | 0x0000EF00…0x0000FFEF | 4336 | Dynamic service users | `systemd` | `nss-systemd` |
| 65520…65533 | 0x0000FFF0…0x0000FFFD | 13 | *unused* | | |
| 65534 | 0x0000FFFE | 1 | `nobody` user | Linux | `/etc/passwd` + `nss-systemd` |
| 65535 | 0x0000FFFF | 1 | 16-bit `(uid_t) -1` | Linux | |
| 65536…524287 | 0x00010000…0x0007FFFF | 458752 | *unused* | | |
| 524288…1879048191 | 0x00080000…0x6FFFFFFF | 1878523904 | Container UID ranges | `systemd` | `nss-systemd` |
| 1879048192…2147352575 | 0x70000000…0x7FFDFFFF | 1879048192 | *unused* | | |
| 2147352576…2147418111 | 0x7FFE0000…0x7FFEFFFF | 65536 | Foreign UID range | `systemd` | `nss-systemd` |
| 2147418112…2147483647 | 0x7FFF0000…0x7FFFFFFF | 65536 | *unused* | | |
| 2147483648…4294967294 | 0x80000000…0xFFFFFFFE | 2147483647 | *HIC SVNT LEONES* | | |
| 4294967295 | 0xFFFFFFFF | 1 | 32-bit `(uid_t) -1` | Linux | |
Note that "Unused" in the table above doesn't mean that these ranges are really unused.
It just means that these ranges have no well-established

View File

@ -259,14 +259,17 @@ It's probably wise to use a location string processable by geo-location subsyste
Example: `Berlin, Germany` or `Basement, Room 3a`.
`disposition` → A string, one of `intrinsic`, `system`, `dynamic`, `regular`,
`container`, `reserved`. If specified clarifies the disposition of the user,
`container`, `foreign`, `reserved`. If specified clarifies the disposition of the user,
i.e. the context it is defined in.
For regular, "human" users this should be `regular`, for system users (i.e. users that system services run under, and similar) this should be `system`.
The `intrinsic` disposition should be used only for the two users that have special meaning to the OS kernel itself,
i.e. the `root` and `nobody` users.
The `container` string should be used for users that are used by an OS container, and hence will show up in `ps` listings
and such, but are only defined in container context.
Finally `reserved` should be used for any users outside of these use-cases.
The `foreign` string should be used for users from UID ranges which are used
for OS images from foreign systems, i.e. where local resolution would not make
sense.
Finally, `reserved` should be used for any users outside of these use-cases.
Note that this property is entirely optional and applications are assumed to be able to derive the
disposition of a user automatically from a record even in absence of this
field, based on other fields, for example the numeric UID. By setting this

View File

@ -3,3 +3,7 @@
# Dell iDRAC Virtual USB NIC
usb:v413CpA102*
ID_NET_NAME_FROM_DATABASE=idrac
# Disable inclusion of PCI domain in interface names on Azure MANA
pci:v00001414d000000BA*
ID_NET_NAME_INCLUDE_DOMAIN=0

View File

@ -97,6 +97,18 @@
</listitem>
</varlistentry>
<varlistentry>
<term><varname>systemd.break=</varname></term>
<term><varname>rd.systemd.break=</varname></term>
<listitem>
<para>Parameters understood by
<citerefentry><refentrytitle>systemd-debug-generator</refentrytitle><manvolnum>8</manvolnum></citerefentry>,
to pause the boot process at a certain point and spawn a debug shell.</para>
<xi:include href="version-info.xml" xpointer="v258"/>
</listitem>
</varlistentry>
<varlistentry>
<term><varname>systemd.run=</varname></term>
<term><varname>systemd.run_success_action=</varname></term>

View File

@ -147,6 +147,8 @@ node /org/freedesktop/systemd1 {
AttachProcessesToUnit(in s unit_name,
in s subcgroup,
in au pids);
RemoveSubgroupFromUnit(in s unit_name,
in s subcgroup);
AbandonScope(in s name);
GetJob(in u id,
out o job);
@ -870,6 +872,8 @@ node /org/freedesktop/systemd1 {
<variablelist class="dbus-method" generated="True" extra-ref="AttachProcessesToUnit()"/>
<variablelist class="dbus-method" generated="True" extra-ref="RemoveSubgroupFromUnit()"/>
<variablelist class="dbus-method" generated="True" extra-ref="AbandonScope()"/>
<variablelist class="dbus-method" generated="True" extra-ref="GetJob()"/>
@ -1599,6 +1603,12 @@ node /org/freedesktop/systemd1 {
parameters. The possible values are <literal>configuration</literal>, <literal>state</literal>,
<literal>logs</literal>, <literal>cache</literal>, <literal>runtime</literal>,
<literal>fdstore</literal>, and <literal>all</literal>.</para>
<para><function>RemoveSubgroupFromUnit()</function> removes a subcgroup belonging to a unit's
cgroup. Takes two arguments: the unit name (if empty defaults to the caller's unit), and a cgroup path
(which must start start with a slash <literal>/</literal>), which is taken relative to the unit's
cgroup. This is primarily useful for unprivileged service managers to ask the system service manager
for removal of subcgroups it manages, in case one was delegated to other UIDs.</para>
</refsect2>
<refsect2>
@ -2704,6 +2714,7 @@ node /org/freedesktop/systemd1/unit/avahi_2ddaemon_2eservice {
GetProcesses(out a(sus) processes);
AttachProcesses(in s subcgroup,
in au pids);
RemoveSubgroup(in s subcgroup);
properties:
@org.freedesktop.DBus.Property.EmitsChangedSignal("const")
readonly s Type = '...';
@ -3398,6 +3409,8 @@ node /org/freedesktop/systemd1/unit/avahi_2ddaemon_2eservice {
<!--method AttachProcesses is not documented!-->
<!--method RemoveSubgroup is not documented!-->
<!--property Type is not documented!-->
<!--property ExitType is not documented!-->
@ -4006,6 +4019,8 @@ node /org/freedesktop/systemd1/unit/avahi_2ddaemon_2eservice {
<variablelist class="dbus-method" generated="True" extra-ref="AttachProcesses()"/>
<variablelist class="dbus-method" generated="True" extra-ref="RemoveSubgroup()"/>
<variablelist class="dbus-property" generated="True" extra-ref="Type"/>
<variablelist class="dbus-property" generated="True" extra-ref="ExitType"/>
@ -4901,6 +4916,7 @@ node /org/freedesktop/systemd1/unit/avahi_2ddaemon_2esocket {
GetProcesses(out a(sus) processes);
AttachProcesses(in s subcgroup,
in au pids);
RemoveSubgroup(in s subcgroup);
properties:
@org.freedesktop.DBus.Property.EmitsChangedSignal("const")
readonly s BindIPv6Only = '...';
@ -5592,6 +5608,8 @@ node /org/freedesktop/systemd1/unit/avahi_2ddaemon_2esocket {
<!--method AttachProcesses is not documented!-->
<!--method RemoveSubgroup is not documented!-->
<!--property BindIPv6Only is not documented!-->
<!--property Backlog is not documented!-->
@ -6206,6 +6224,8 @@ node /org/freedesktop/systemd1/unit/avahi_2ddaemon_2esocket {
<variablelist class="dbus-method" generated="True" extra-ref="AttachProcesses()"/>
<variablelist class="dbus-method" generated="True" extra-ref="RemoveSubgroup()"/>
<variablelist class="dbus-property" generated="True" extra-ref="BindIPv6Only"/>
<variablelist class="dbus-property" generated="True" extra-ref="Backlog"/>
@ -7001,6 +7021,7 @@ node /org/freedesktop/systemd1/unit/home_2emount {
GetProcesses(out a(sus) processes);
AttachProcesses(in s subcgroup,
in au pids);
RemoveSubgroup(in s subcgroup);
properties:
@org.freedesktop.DBus.Property.EmitsChangedSignal("const")
readonly s Where = '...';
@ -7601,6 +7622,8 @@ node /org/freedesktop/systemd1/unit/home_2emount {
<!--method AttachProcesses is not documented!-->
<!--method RemoveSubgroup is not documented!-->
<!--property Where is not documented!-->
<!--property What is not documented!-->
@ -8141,6 +8164,8 @@ node /org/freedesktop/systemd1/unit/home_2emount {
<variablelist class="dbus-method" generated="True" extra-ref="AttachProcesses()"/>
<variablelist class="dbus-method" generated="True" extra-ref="RemoveSubgroup()"/>
<variablelist class="dbus-property" generated="True" extra-ref="Where"/>
<variablelist class="dbus-property" generated="True" extra-ref="What"/>
@ -8991,6 +9016,7 @@ node /org/freedesktop/systemd1/unit/dev_2dsda3_2eswap {
GetProcesses(out a(sus) processes);
AttachProcesses(in s subcgroup,
in au pids);
RemoveSubgroup(in s subcgroup);
properties:
readonly s What = '...';
readonly i Priority = ...;
@ -9577,6 +9603,8 @@ node /org/freedesktop/systemd1/unit/dev_2dsda3_2eswap {
<!--method AttachProcesses is not documented!-->
<!--method RemoveSubgroup is not documented!-->
<!--property What is not documented!-->
<!--property Priority is not documented!-->
@ -10103,6 +10131,8 @@ node /org/freedesktop/systemd1/unit/dev_2dsda3_2eswap {
<variablelist class="dbus-method" generated="True" extra-ref="AttachProcesses()"/>
<variablelist class="dbus-method" generated="True" extra-ref="RemoveSubgroup()"/>
<variablelist class="dbus-property" generated="True" extra-ref="What"/>
<variablelist class="dbus-property" generated="True" extra-ref="Priority"/>
@ -10805,6 +10835,7 @@ node /org/freedesktop/systemd1/unit/system_2eslice {
GetProcesses(out a(sus) processes);
AttachProcesses(in s subcgroup,
in au pids);
RemoveSubgroup(in s subcgroup);
properties:
@org.freedesktop.DBus.Property.EmitsChangedSignal("false")
readonly s Slice = '...';
@ -11004,6 +11035,8 @@ node /org/freedesktop/systemd1/unit/system_2eslice {
<!--method AttachProcesses is not documented!-->
<!--method RemoveSubgroup is not documented!-->
<!--property Slice is not documented!-->
<!--property ControlGroupId is not documented!-->
@ -11196,6 +11229,8 @@ node /org/freedesktop/systemd1/unit/system_2eslice {
<variablelist class="dbus-method" generated="True" extra-ref="AttachProcesses()"/>
<variablelist class="dbus-method" generated="True" extra-ref="RemoveSubgroup()"/>
<variablelist class="dbus-property" generated="True" extra-ref="Slice"/>
<variablelist class="dbus-property" generated="True" extra-ref="ControlGroup"/>
@ -11411,6 +11446,7 @@ node /org/freedesktop/systemd1/unit/session_2d1_2escope {
GetProcesses(out a(sus) processes);
AttachProcesses(in s subcgroup,
in au pids);
RemoveSubgroup(in s subcgroup);
signals:
RequestStop();
properties:
@ -11636,6 +11672,8 @@ node /org/freedesktop/systemd1/unit/session_2d1_2escope {
<!--method AttachProcesses is not documented!-->
<!--method RemoveSubgroup is not documented!-->
<!--property RuntimeMaxUSec is not documented!-->
<!--property RuntimeRandomizedExtraUSec is not documented!-->
@ -11850,6 +11888,8 @@ node /org/freedesktop/systemd1/unit/session_2d1_2escope {
<variablelist class="dbus-method" generated="True" extra-ref="AttachProcesses()"/>
<variablelist class="dbus-method" generated="True" extra-ref="RemoveSubgroup()"/>
<variablelist class="dbus-signal" generated="True" extra-ref="RequestStop()"/>
<variablelist class="dbus-property" generated="True" extra-ref="Controller"/>
@ -12254,6 +12294,7 @@ $ gdbus introspect --system --dest org.freedesktop.systemd1 \
<varname>ShutdownStartTimestamp</varname>,
<varname>ShutdownStartTimestampMonotonic</varname>, and
<varname>SoftRebootsCount</varname> were added in version 256.</para>
<para><function>RemoveSubgroupFromUnit()</function> was added in version 258.</para>
</refsect2>
<refsect2>
<title>Unit Objects</title>
@ -12320,7 +12361,7 @@ $ gdbus introspect --system --dest org.freedesktop.systemd1 \
<varname>ProtectControlGroupsEx</varname>,
<varname>PrivateUsersEx</varname>, and
<varname>PrivatePIDs</varname> were added in version 257.</para>
<para><varname>ProtectHostnameEx</varname> was added in version 258.</para>
<para><varname>ProtectHostnameEx</varname> and <function>RemoveSubGroup()</function> were added in version 258.</para>
</refsect2>
<refsect2>
<title>Socket Unit Objects</title>
@ -12364,7 +12405,7 @@ $ gdbus introspect --system --dest org.freedesktop.systemd1 \
<varname>ManagedOOMMemoryPressureDurationUSec</varname>,
<varname>ProtectControlGroupsEx</varname>, and
<varname>PrivatePIDs</varname> were added in version 257.</para>
<para><varname>ProtectHostnameEx</varname> was added in version 258.</para>
<para><varname>ProtectHostnameEx</varname> and <function>RemoveSubgroup()</function> were added in version 258.</para>
</refsect2>
<refsect2>
<title>Mount Unit Objects</title>
@ -12405,7 +12446,7 @@ $ gdbus introspect --system --dest org.freedesktop.systemd1 \
<varname>ManagedOOMMemoryPressureDurationUSec</varname>,
<varname>ProtectControlGroupsEx</varname>, and
<varname>PrivatePIDs</varname> were added in version 257.</para>
<para><varname>ProtectHostnameEx</varname> was added in version 258.</para>
<para><varname>ProtectHostnameEx</varname> and <function>RemoveSubgroup()</function> was added in version 258.</para>
</refsect2>
<refsect2>
<title>Swap Unit Objects</title>
@ -12446,7 +12487,7 @@ $ gdbus introspect --system --dest org.freedesktop.systemd1 \
<varname>ManagedOOMMemoryPressureDurationUSec</varname>,
<varname>ProtectControlGroupsEx</varname>, and
<varname>PrivatePIDs</varname> were added in version 257.</para>
<para><varname>ProtectHostnameEx</varname> was added in version 258.</para>
<para><varname>ProtectHostnameEx</varname> and <function>RemoveSubgroup()</function> were added in version 258.</para>
</refsect2>
<refsect2>
<title>Slice Unit Objects</title>
@ -12472,6 +12513,7 @@ $ gdbus introspect --system --dest org.freedesktop.systemd1 \
<varname>EffectiveTasksMax</varname>, and
<varname>MemoryZSwapWriteback</varname> were added in version 256.</para>
<para><varname>ManagedOOMMemoryPressureDurationUSec</varname> was added in version 257.</para>
<para><function>RemoveSubgroup()</function> was added in version 258.</para>
</refsect2>
<refsect2>
<title>Scope Unit Objects</title>
@ -12498,6 +12540,7 @@ $ gdbus introspect --system --dest org.freedesktop.systemd1 \
<varname>EffectiveTasksMax</varname>, and
<varname>MemoryZSwapWriteback</varname> were added in version 256.</para>
<para><varname>ManagedOOMMemoryPressureDurationUSec</varname> was added in version 257.</para>
<para><function>RemoveSubgroup()</function> was added in version 258.</para>
</refsect2>
<refsect2>
<title>Job Objects</title>

View File

@ -205,6 +205,11 @@
<arg choice="opt" rep="repeat">OPTIONS</arg>
<arg choice="plain">smbios11</arg>
</cmdsynopsis>
<cmdsynopsis>
<command>systemd-analyze</command>
<arg choice="opt" rep="repeat">OPTIONS</arg>
<arg choice="plain">chid</arg>
</cmdsynopsis>
</refsynopsisdiv>
<refsect1>
@ -1084,6 +1089,37 @@ io.systemd.credential:vmm.notify_socket=vsock-stream:2:254570042
<xi:include href="version-info.xml" xpointer="v257"/>
</refsect2>
<refsect2>
<title><command>systemd-analyze chid</command></title>
<para>Shows a list of Computer Hardware IDs (CHIDs) of the local system. These IDs identify the
system's computer hardware, based on SMBIOS data. See <ulink
url="https://learn.microsoft.com/en-us/windows-hardware/drivers/dashboard/using-chids">Using Computer
Hardware IDs (CHIDs)</ulink> for details about CHIDs.</para>
<example>
<title>Example output</title>
<programlisting>$ systemd-analyze chid
TYPE INPUT CHID
3 MFPSmp 520537c0-3b59-504f-b062-9682ea236b21
4 MFPS-- edf05dc8-a53d-5b2c-8023-630bca2a2463
5 MFP--- ebc6a4d9-ec48-537a-916b-c69fa4fdd814
6 M--Smp 5ebe4bba-f598-5e90-9ff2-9fd0d3211465
7 M--S-- 1a3fb835-b42a-5f9c-a38c-eff5bfd5c41d
8 M-P-mp 2a831dce-8163-5bad-8406-435b8c752dd8
9 M-P--- 7c21c878-4a75-50f7-9816-21e811588da0
10 MF--mp 9a003537-bcc5-500e-b10a-8d8892e4fc64
11 MF---- bb9122bb-8a5c-50d2-a742-a85beb719909
13 M---mp bfc36935-5032-5987-a0a3-6311f01de33a
LEGEND: M → sys_vendor (LENOVO) ┄ F → product_family (ThinkPad X1 Carbon Gen 9) ┄ P → product_name (20XW0055GE)
S → product_sku (LENOVO_MT_20XW_BU_Think_FM_ThinkPad X1 Carbon Gen 9) ┄ m → board_vendor (LENOVO)
p → board_name (20XW0055GE)</programlisting>
</example>
<xi:include href="version-info.xml" xpointer="v258"/>
</refsect2>
</refsect1>
<refsect1>

View File

@ -31,45 +31,131 @@
<refsect1>
<title>Description</title>
<para><filename>systemd-debug-generator</filename> is a generator
that reads the kernel command line and understands three
options:</para>
<para><command>systemd-debug-generator</command> is a generator that provides some debugging
functionality.</para>
<para>If the <option>systemd.mask=</option> or <option>rd.systemd.mask=</option>
option is specified and followed by a unit name, this unit is
masked for the runtime (i.e. for this session — from boot to shutdown), similarly to the effect of
<citerefentry><refentrytitle>systemctl</refentrytitle><manvolnum>1</manvolnum></citerefentry>'s
<command>mask</command> command. This is useful to boot with
certain units removed from the initial boot transaction for
debugging system startup. May be specified more than once.
<option>rd.systemd.mask=</option> is honored only by initial
RAM disk (initrd) while <option>systemd.mask=</option> is
honored only in the main system.</para>
<para>If the <option>systemd.wants=</option> or
<option>rd.systemd.wants=</option> option is specified
and followed by a unit name, a start job for this unit is added to
the initial transaction. This is useful to start one or more
additional units at boot. May be specified more than once.
<option>rd.systemd.wants=</option> is honored only by initial
RAM disk (initrd) while <option>systemd.wants=</option> is
honored only in the main system.</para>
<para>If the <option>systemd.debug_shell</option> or <option>rd.systemd.debug_shell</option> option is
specified, the debug shell service <literal>debug-shell.service</literal> is pulled into the boot
transaction and a debug shell will be spawned during early boot. By default,
<filename>&DEBUGTTY;</filename> is used, but a specific tty can also be specified, either with or without
the <filename>/dev/</filename> prefix. To set the tty to use without enabling the debug shell, the
<option>systemd.default_debug_tty=</option> option can be used which also takes a tty with or without the
<filename>/dev/</filename> prefix. Note that the shell may also be turned on persistently by enabling it
with <citerefentry><refentrytitle>systemctl</refentrytitle><manvolnum>1</manvolnum></citerefentry>'s
<command>enable</command> command. <option>rd.systemd.debug_shell</option> is honored only by initial
RAM disk (initrd) while <option>systemd.debug_shell</option> is honored only in the main system.</para>
<para><filename>systemd-debug-generator</filename> implements
<para><command>systemd-debug-generator</command> implements
<citerefentry><refentrytitle>systemd.generator</refentrytitle><manvolnum>7</manvolnum></citerefentry>.</para>
</refsect1>
<refsect1>
<title>Kernel Command Line</title>
<para><command>systemd-debug-generator</command> understands the following kernel command line
parameters:</para>
<variablelist class='kernel-commandline-options'>
<varlistentry>
<term><varname>systemd.mask=</varname></term>
<term><varname>rd.systemd.mask=</varname></term>
<listitem><para>These options take a unit name as argument. The unit specified is masked for the
runtime (i.e. for this session — from boot to shutdown), similarly to the effect of
<citerefentry><refentrytitle>systemctl</refentrytitle><manvolnum>1</manvolnum></citerefentry>'s
<command>mask</command> command. This is useful to boot with certain units removed from the initial
boot transaction for debugging system startup. May be specified more than once. The option prefixed
with <literal>rd.</literal> is honored only in the initrd, while the one without prefix is only
honored in the main system.</para>
<xi:include href="version-info.xml" xpointer="v215"/></listitem>
</varlistentry>
<varlistentry>
<term><varname>systemd.wants=</varname></term>
<term><varname>rd.systemd.wants=</varname></term>
<listitem><para>These options take a unit name as argument. A start job for this unit is added to the
initial transaction. This is useful to start one or more additional units at boot. May be specified
more than once. The option prefixed with <literal>rd.</literal> is honored only in the initrd, while
the one that is not prefixed only in the main system.</para>
<xi:include href="version-info.xml" xpointer="v215"/></listitem>
</varlistentry>
<varlistentry>
<term><varname>systemd.debug_shell</varname></term>
<term><varname>rd.systemd.debug_shell</varname></term>
<term><varname>systemd.default_debug_tty=</varname></term>
<term><varname>rd.systemd.default_debug_tty=</varname></term>
<listitem><para>If the <option>systemd.debug_shell</option> or
<option>rd.systemd.debug_shell</option> option is specified, the debug shell service
<literal>debug-shell.service</literal> is pulled into the boot transaction and a debug shell will be
spawned during early boot. By default, <filename>&DEBUGTTY;</filename> is used, but a specific tty
can also be specified, either with or without the <filename>/dev/</filename> prefix. To set the tty
to use without enabling the debug shell, the <option>systemd.default_debug_tty=</option> option can
be used which also takes a tty with or without the <filename>/dev/</filename> prefix. Note that the
shell may also be turned on persistently by enabling it with
<citerefentry><refentrytitle>systemctl</refentrytitle><manvolnum>1</manvolnum></citerefentry>'s
<command>enable</command> command. The options prefixed with <literal>rd.</literal> are honored only
in the initrd, while the ones without prefix are only honored in the main system.</para>
<xi:include href="version-info.xml" xpointer="v215"/></listitem>
</varlistentry>
<varlistentry>
<term><varname>systemd.break=</varname></term>
<term><varname>rd.systemd.break=</varname></term>
<listitem><para>Takes one of <option>pre-udev</option>, <option>pre-basic</option>,
<option>pre-mount</option>, or <option>pre-switch-root</option> (the default for the
<literal>rd.</literal> option). It also accepts multiple values separated by comma
(<literal>,</literal>). These options allow to pause the boot process at a certain point and spawn a
debug shell. After exiting this shell, the system will resume booting. The option prefixed with
<literal>rd.</literal> is honored only in the initrd, while the one without prefix is only honored in
the main system.</para>
<table>
<title>Available breakpoints</title>
<tgroup cols='4'>
<colspec colname='breakpoint' />
<colspec colname='description' />
<colspec colname='initrd' />
<colspec colname='main' />
<thead>
<row>
<entry>Breakpoints</entry>
<entry>Description</entry>
<entry>Can be used in the initrd</entry>
<entry>Can be used in the main system</entry>
</row>
</thead>
<tbody>
<row>
<entry><option>pre-udev</option></entry>
<entry>Before starting to process kernel uevents, i.e., before <filename>systemd-udevd.service</filename> starts.</entry>
<entry></entry>
<entry></entry>
</row>
<row>
<entry><option>pre-basic</option></entry>
<entry>Before leaving early boot and regular services start, i.e., before <filename>basic.target</filename> is reached.</entry>
<entry></entry>
<entry></entry>
</row>
<row>
<entry><option>pre-mount</option></entry>
<entry>Before the root filesystem is mounted, i.e., before <filename>sysroot.mount</filename> starts.</entry>
<entry></entry>
<entry></entry>
</row>
<row>
<entry><option>pre-switch-root</option></entry>
<entry>Before switching from the initrd to the real root.</entry>
<entry></entry>
<entry></entry>
</row>
</tbody>
</tgroup>
</table>
<xi:include href="version-info.xml" xpointer="v258"/></listitem>
</varlistentry>
</variablelist>
</refsect1>
<refsect1>
<title>System Credentials</title>
@ -108,6 +194,8 @@
<member><citerefentry><refentrytitle>systemd</refentrytitle><manvolnum>1</manvolnum></citerefentry></member>
<member><citerefentry><refentrytitle>systemctl</refentrytitle><manvolnum>1</manvolnum></citerefentry></member>
<member><citerefentry><refentrytitle>kernel-command-line</refentrytitle><manvolnum>7</manvolnum></citerefentry></member>
<member><citerefentry><refentrytitle>systemd.system-credentials</refentrytitle><manvolnum>7</manvolnum></citerefentry></member>
<member><citerefentry><refentrytitle>bootup</refentrytitle><manvolnum>7</manvolnum></citerefentry></member>
</simplelist></para>
</refsect1>

View File

@ -62,6 +62,9 @@
<cmdsynopsis>
<command>systemd-dissect</command> <arg choice="opt" rep="repeat">OPTIONS</arg> <arg>--validate</arg> <arg choice="plain"><replaceable>IMAGE</replaceable></arg>
</cmdsynopsis>
<cmdsynopsis>
<command>systemd-dissect</command> <arg choice="opt" rep="repeat">OPTIONS</arg> <arg>--shift</arg> <arg choice="plain"><replaceable>IMAGE</replaceable></arg> <arg choice="plain"><replaceable>UIDBASE</replaceable></arg>
</cmdsynopsis>
</refsynopsisdiv>
<refsect1>
@ -350,6 +353,27 @@
<xi:include href="version-info.xml" xpointer="v254"/></listitem>
</varlistentry>
<varlistentry>
<term><option>--shift</option></term>
<listitem><para>Recursively iterates through all inodes of the specified image and shifts the UIDs
and GIDs the inodes are owned by into the specified UID range. Takes an image path and a UID base as
parameter. The UID base can be specified numerically (in which case it must be a multiple of 65536,
and either 0 or within the container or foreign UID range, as per <ulink
url="https://systemd.io/UIDS-GIDS/">Users, Groups, UIDs and GIDs on systemd Systems</ulink>), or as
the symbolic identifier <literal>foreign</literal> which is shorthand to the foreign UID base. This
command is useful for preparing directory container images for unprivileged use. Note that this
command is intended for images that use the 16bit UIDs/GIDs range only, and it always ignores the
upper 16bit of the current UID/GID ownership, combining the lower 16 bit with the target UID
base.</para>
<para>Use <command>systemd-dissect --shift /some/container/tree foreign</command> to shift a
container image into the foreign UID range, or <command>systemd-dissect --shift /some/container/tree
0</command> to shift it to host UID range.</para>
<xi:include href="version-info.xml" xpointer="v258"/></listitem>
</varlistentry>
<xi:include href="standard-options.xml" xpointer="help" />
<xi:include href="standard-options.xml" xpointer="version" />
</variablelist>
@ -503,6 +527,26 @@
<xi:include href="version-info.xml" xpointer="v254"/></listitem>
</varlistentry>
<varlistentry>
<term><option>--system</option></term>
<term><option>--user</option></term>
<listitem><para>When used together with <option>--discover</option> controls whether to search for
images installed system-wide or in the user's directories in <varname>$HOME</varname>. If neither
switch is specified, will search within both scopes.</para>
<xi:include href="version-info.xml" xpointer="v258"/></listitem>
</varlistentry>
<varlistentry>
<term><option>--all</option></term>
<listitem><para>If combined with <option>--discover</option>, also shows images that start with a
dot, i.e. hidden images.</para>
<xi:include href="version-info.xml" xpointer="v258"/></listitem>
</varlistentry>
<xi:include href="standard-options.xml" xpointer="image-policy-open" />
<xi:include href="standard-options.xml" xpointer="no-pager" />
<xi:include href="standard-options.xml" xpointer="no-legend" />

View File

@ -841,6 +841,12 @@
host and container UIDs/GIDs are chosen identically it does provide process capability isolation,
but may be useful if proper user namespacing with distinct UID maps is not possible. This option is
not secure and must not be used to run untrusted code.</para></listitem>
<listitem><para>If the parameter is <literal>managed</literal>, user namespacing is employed with
in <emphasis>managed</emphasis> mode, i.e. allocation of a UID range is delegated to
<citerefentry><refentrytitle>systemd-nsresourced.service</refentrytitle><manvolnum>8</manvolnum></citerefentry>. This
mode is selected by default if invoked unprivileged, but can also be requested explicitly when
privileged. In this mode a 64K UID range is automatically picked.</para></listitem>
</orderedlist>
<para>It is recommended to assign at least 65536 UIDs/GIDs to each container, so that the usable
@ -852,18 +858,23 @@
<para>When user namespaces are used, the GID range assigned to each container is always chosen
identical to the UID range.</para>
<para>In most cases, using <option>--private-users=pick</option> is the recommended option as user
namespacing is required for security, and this option massively enhances container security while
<para>In most cases, <option>--private-users=managed</option> (or when privileged
<option>--private-users=pick</option>, too) is the recommended option as user
namespacing is advised for security, and this option massively enhances container security while
operating fully automatically in most cases.</para>
<para>Note that the picked UID/GID range is not written to <filename>/etc/passwd</filename> or
<filename>/etc/group</filename>. In fact, the allocation of the range is not stored persistently,
except in the file ownership of the files and directories of the container.</para>
except possibly in the file ownership of the files and directories of the container, see
<option>--private-users-ownership=</option>.</para>
<para>Note that when user namespacing is used file ownership on disk reflects this, and all of the container's
files and directories are owned by the container's effective user and group IDs. This means that copying files
from and to the container image requires correction of the numeric UID/GID values, according to the UID/GID
shift applied.</para>
<para>Note that when user namespacing is used without UID mapping (see below) file ownership on disk
reflects this, and all of the container's files and directories are owned by the container's
effective user and group IDs. This means that copying files from and to the container image requires
correction of the numeric UID/GID values, according to the UID/GID shift applied.</para>
<para>Note that for fully unprivileged operation in <literal>managed</literal> mode, any directory
image should be ownd by the foreign UID range.</para>
<xi:include href="version-info.xml" xpointer="v220"/></listitem>
</varlistentry>
@ -875,8 +886,10 @@
chosen with <option>--private-users=</option>, see above. Takes one of <literal>off</literal> (to
leave the image as is), <literal>chown</literal> (to recursively <function>chown()</function> the
container's directory tree as needed), <literal>map</literal> (in order to use transparent ID mapping
mounts) or <literal>auto</literal> for automatically using <literal>map</literal> where available and
<literal>chown</literal> where not.</para>
mounts from UID 0 to the target UID range), <literal>foreign</literal> (the same, but from the
foreign UID range base) or <literal>auto</literal> for automatically using <literal>map</literal> or
<literal>foreign</literal>, where available and applicable and <literal>chown</literal> where
not.</para>
<para>If <literal>chown</literal> is selected, all files and directories in the container's directory
tree will be adjusted so that they are owned by the appropriate UIDs/GIDs selected for the container
@ -884,14 +897,19 @@
directory tree of the container. Besides actual file ownership, file ACLs are adjusted as
well.</para>
<para>Typically <literal>map</literal> is the best choice, since it transparently maps UIDs/GIDs in
memory as needed without modifying the image, and without requiring an expensive recursive adjustment
operation. However, it is not available for all file systems, currently.</para>
<para>Typically <literal>foreign</literal> or <literal>map</literal> is the best choice, since it
transparently maps UIDs/GIDs in memory as needed without modifying the image, and without requiring
an expensive recursive adjustment operation. However, it is not available for all file systems,
currently.</para>
<para>The <option>--private-users-ownership=auto</option> option is implied if
<option>--private-users=pick</option> is used. This option has no effect if user namespacing is not
used.</para>
<para><citerefentry><refentrytitle>systemd-dissect</refentrytitle><manvolnum>1</manvolnum></citerefentry>'s
<option>--shift</option> switch may be used to shift UID/GID ownership from or to the 0, foreign or
specific container UID/GID base outside of any <command>systemd-nspawn</command></para> invocation.
<xi:include href="version-info.xml" xpointer="v230"/></listitem>
</varlistentry>

View File

@ -3468,37 +3468,43 @@ StandardInputData=V2XigLJyZSBubyBzdHJhbmdlcnMgdG8gbG92ZQpZb3Uga25vdyB0aGUgcnVsZX
<term><varname>LoadCredentialEncrypted=</varname><replaceable>ID</replaceable><optional>:<replaceable>PATH</replaceable></optional></term>
<listitem><para>Pass a credential to the unit. Credentials are limited-size binary or textual objects
that may be passed to unit processes. They are primarily used for passing cryptographic keys (both
public and private) or certificates, user account information or identity information from host to
services. The data is accessible from the unit's processes via the file system, at a read-only
location that (if possible and permitted) is backed by non-swappable memory. The data is only
accessible to the user associated with the unit, via the
<varname>User=</varname>/<varname>DynamicUser=</varname> settings (as well as the superuser). When
available, the location of credentials is exported as the <varname>$CREDENTIALS_DIRECTORY</varname>
environment variable to the unit's processes.</para>
that may be passed to unit processes. They are primarily intended for passing cryptographic keys
(both public and private) or certificates, user account information or identity information from host
to services, but can be freely used to pass any kind of limited-size information to a service. The
data is accessible from the unit's processes via the file system, at a read-only location that (if
possible and permitted) is backed by non-swappable memory. The data is only accessible to the user
associated with the unit, via the <varname>User=</varname>/<varname>DynamicUser=</varname> settings
(as well as the superuser). When available, the location of credentials is exported as the
<varname>$CREDENTIALS_DIRECTORY</varname> environment variable to the unit's processes.</para>
<para>The <varname>LoadCredential=</varname> setting takes a textual ID to use as name for a
credential plus a file system path, separated by a colon. The ID must be a short ASCII string
suitable as filename in the filesystem, and may be chosen freely by the user. If the specified path
is absolute it is opened as regular file and the credential data is read from it. If the absolute
path refers to an <constant>AF_UNIX</constant> stream socket in the file system a connection is made
to it (only once at unit start-up) and the credential data read from the connection, providing an
to it (once at process invocation) and the credential data read from the connection, providing an
easy IPC integration point for dynamically transferring credentials from other services.</para>
<para>If the specified path is not absolute and itself qualifies as valid credential identifier it is
attempted to find a credential that the service manager itself received under the specified name —
which may be used to propagate credentials from an invoking environment (e.g. a container manager
that invoked the service manager) into a service. If no matching system credential is found, the
directories <filename>/etc/credstore/</filename>, <filename>/run/credstore/</filename> and
<filename>/usr/lib/credstore/</filename> are searched for files under the credential's name — which
hence are recommended locations for credential data on disk. If
that invoked the service manager) into a service. If no matching passed credential is found, the
system service manager will search the directories <filename>/etc/credstore/</filename>,
<filename>/run/credstore/</filename> and <filename>/usr/lib/credstore/</filename> for files under the
credential's name — which hence are recommended locations for credential data on disk. If
<varname>LoadCredentialEncrypted=</varname> is used <filename>/run/credstore.encrypted/</filename>,
<filename>/etc/credstore.encrypted/</filename>, and
<filename>/usr/lib/credstore.encrypted/</filename> are searched as well.</para>
<filename>/usr/lib/credstore.encrypted/</filename> are searched as well. The per-user service manager
will search <filename>$XDG_CONFIG_HOME/credstore/</filename>,
<filename>$XDG_RUNTIME_DIR/credstore/</filename>, <filename>$HOME/.local/lib/credstore/</filename>
(and the counterparts ending with <filename>…/credstore.encrypted/</filename>) instead. The
<citerefentry><refentrytitle>systemd-path</refentrytitle><manvolnum>1</manvolnum></citerefentry> tool
may be used to query the precise credential store search path.</para>
<para>If the file system path is omitted it is chosen identical to the credential name, i.e. this is
a terse way to declare credentials to inherit from the service manager into a service. This option
may be used multiple times, each time defining an additional credential to pass to the unit.</para>
a terse way to declare credentials to inherit from the service manager or credstore directories into
a service. This option may be used multiple times, each time defining an additional credential to
pass to the unit.</para>
<para>Note that if the path is not specified or a valid credential identifier is given, i.e.
in the above two cases, a missing credential is not considered fatal.</para>

View File

@ -136,9 +136,9 @@
<term><option>--synthesize=<replaceable>BOOL</replaceable></option></term>
<listitem><para>Controls whether to synthesize records for the root and nobody users/groups if they
aren't defined otherwise. By default (or <literal>yes</literal>) such records are implicitly
synthesized if otherwise missing since they have special significance to the OS. When
<literal>no</literal> this synthesizing is turned off.</para>
aren't defined otherwise, as well as the user/groups for the "foreign" UID range. By default (or
<literal>yes</literal>) such records are implicitly synthesized if otherwise missing since they have
special significance to the OS. When <literal>no</literal> this synthesizing is turned off.</para>
<xi:include href="version-info.xml" xpointer="v245"/></listitem>
</varlistentry>

View File

@ -886,6 +886,9 @@ container_uid_base_max = get_option('container-uid-base-max')
conf.set('CONTAINER_UID_BASE_MIN', container_uid_base_min)
conf.set('CONTAINER_UID_BASE_MAX', container_uid_base_max)
foreign_uid_base = get_option('foreign-uid-base')
conf.set('FOREIGN_UID_BASE', foreign_uid_base)
nobody_user = get_option('nobody-user')
nobody_group = get_option('nobody-group')
@ -3001,6 +3004,7 @@ summary({
conf.get('SYSTEM_ALLOC_GID_MIN')),
'dynamic UIDs' : '@0@…@1@'.format(dynamic_uid_min, dynamic_uid_max),
'container UID bases' : '@0@…@1@'.format(container_uid_base_min, container_uid_base_max),
'foreign UID base' : '@0@'.format(foreign_uid_base),
'static UID/GID allocations' : ' '.join(static_ugids),
'/dev/kvm access mode' : get_option('dev-kvm-mode'),
'render group access mode' : get_option('group-render-mode'),

View File

@ -273,6 +273,8 @@ option('container-uid-base-min', type : 'integer', value : 0x00080000,
description : 'minimum container UID base')
option('container-uid-base-max', type : 'integer', value : 0x6FFF0000,
description : 'maximum container UID base')
option('foreign-uid-base', type : 'integer', value : 0x7FFE0000,
description : 'foreign OS image UID base')
option('adm-group', type : 'boolean',
description : 'the ACL for adm group should be added')
option('wheel-group', type : 'boolean',

View File

@ -1,5 +1,6 @@
#!/bin/bash
# SPDX-License-Identifier: LGPL-2.1-or-later
set -e
if command -v flatpak-spawn >/dev/null; then
SPAWN=(flatpak-spawn --host)
@ -7,7 +8,7 @@ else
SPAWN=()
fi
MKOSI_CONFIG="$("${SPAWN[@]}" --host mkosi --json summary | jq -r .Images[-1])"
MKOSI_CONFIG="$("${SPAWN[@]}" mkosi --json summary | jq -r .Images[-1])"
DISTRIBUTION="$(jq -r .Distribution <<< "$MKOSI_CONFIG")"
RELEASE="$(jq -r .Release <<< "$MKOSI_CONFIG")"
ARCH="$(jq -r .Architecture <<< "$MKOSI_CONFIG")"

View File

@ -29,6 +29,7 @@ RepartDirectories=mkosi.repart
OutputDirectory=build/mkosi.output
[Build]
ToolsTree=default
BuildDirectory=build/mkosi.builddir
CacheDirectory=build/mkosi.cache
BuildSourcesEphemeral=yes
@ -130,7 +131,7 @@ Packages=
zsh
zstd
[Host]
[Runtime]
Credentials=
journal.storage=persistent
tty.serial.hvc0.agetty.autologin=root

View File

@ -3,6 +3,7 @@
[Build]
ToolsTreePackages=
gcc
gdb
gperf
lcov
llvm

View File

@ -0,0 +1,7 @@
# SPDX-License-Identifier: LGPL-2.1-or-later
[Match]
ToolsTreeDistribution=centos
[Build]
ToolsTreeRepositories=epel,epel-next

View File

@ -5,6 +5,7 @@ ToolsTreeDistribution=opensuse
[Build]
ToolsTreePackages=
libz1
gh
mypy
pkgconfig(blkid)

View File

@ -0,0 +1,7 @@
# SPDX-License-Identifier: LGPL-2.1-or-later
[Match]
PathExists=build/
[Build]
ExtraSearchPaths=build/

View File

@ -8,8 +8,8 @@ msgid ""
msgstr ""
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2024-11-28 18:16+0900\n"
"PO-Revision-Date: 2024-11-20 19:13+0000\n"
"Last-Translator: Jiri Grönroos <jiri.gronroos@iki.fi>\n"
"PO-Revision-Date: 2024-12-20 15:38+0000\n"
"Last-Translator: Ricky Tigg <ricky.tigg@gmail.com>\n"
"Language-Team: Finnish <https://translate.fedoraproject.org/projects/systemd/"
"main/fi/>\n"
"Language: fi\n"
@ -17,7 +17,7 @@ msgstr ""
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"
"Plural-Forms: nplurals=2; plural=n != 1;\n"
"X-Generator: Weblate 5.8.2\n"
"X-Generator: Weblate 5.9.1\n"
#: src/core/org.freedesktop.systemd1.policy.in:22
msgid "Send passphrase back to system"
@ -1176,9 +1176,8 @@ msgid "Manage optional features"
msgstr "Hallitse valinnaisia ominaisuuksia"
#: src/sysupdate/org.freedesktop.sysupdate1.policy:76
#, fuzzy
msgid "Authentication is required to manage optional features."
msgstr "Todennus vaaditaan valinnaisten ominaisuuksien hallintaan"
msgstr "Todennus vaaditaan valinnaisten ominaisuuksien hallintaan."
#: src/timedate/org.freedesktop.timedate1.policy:22
msgid "Set system time"

View File

@ -67,7 +67,7 @@ _systemd_analyze() {
)
local -A VERBS=(
[STANDALONE]='time blame unit-files unit-paths exit-status compare-versions calendar timestamp timespan pcrs srk has-tpm2 smbios11'
[STANDALONE]='time blame unit-files unit-paths exit-status compare-versions calendar timestamp timespan pcrs srk has-tpm2 smbios11 chid'
[CRITICAL_CHAIN]='critical-chain'
[DOT]='dot'
[DUMP]='dump'

View File

@ -29,6 +29,8 @@ _systemd_dissect() {
local cur=${COMP_WORDS[COMP_CWORD]} prev_1=${COMP_WORDS[COMP_CWORD-1]} prev_2=${COMP_WORDS[COMP_CWORD-2]} words cword
local -A OPTS=(
[STANDALONE]='-h --help --version
--user
--system
--discover
--no-pager
--no-legend

343
src/analyze/analyze-chid.c Normal file
View File

@ -0,0 +1,343 @@
/* SPDX-License-Identifier: LGPL-2.1-or-later */
#include "analyze.h"
#include "analyze-chid.h"
#include "chid-fundamental.h"
#include "efi-api.h"
#include "escape.h"
#include "fd-util.h"
#include "fileio.h"
#include "format-table.h"
#include "parse-util.h"
#include "strv.h"
#include "utf8.h"
#include "virt.h"
static int parse_chid_type(const char *s, size_t *ret) {
unsigned u;
int r;
assert(s);
r = safe_atou(s, &u);
if (r < 0)
return r;
if (u >= CHID_TYPES_MAX)
return -ERANGE;
if (ret)
*ret = u;
return 0;
}
static const char *const chid_smbios_friendly[_CHID_SMBIOS_FIELDS_MAX] = {
[CHID_SMBIOS_MANUFACTURER] = "manufacturer",
[CHID_SMBIOS_FAMILY] = "family",
[CHID_SMBIOS_PRODUCT_NAME] = "product-name",
[CHID_SMBIOS_PRODUCT_SKU] = "product-sku",
[CHID_SMBIOS_BASEBOARD_MANUFACTURER] = "baseboard-manufacturer",
[CHID_SMBIOS_BASEBOARD_PRODUCT] = "baseboard-product",
[CHID_SMBIOS_BIOS_VENDOR] = "bios-vendor",
[CHID_SMBIOS_BIOS_VERSION] = "bios-version",
[CHID_SMBIOS_BIOS_MAJOR] = "bios-major",
[CHID_SMBIOS_BIOS_MINOR] = "bios-minor",
[CHID_SMBIOS_ENCLOSURE_TYPE] = "enclosure-type",
};
static const char chid_smbios_fields_char[_CHID_SMBIOS_FIELDS_MAX] = {
[CHID_SMBIOS_MANUFACTURER] = 'M',
[CHID_SMBIOS_FAMILY] = 'F',
[CHID_SMBIOS_PRODUCT_NAME] = 'P',
[CHID_SMBIOS_PRODUCT_SKU] = 'S',
[CHID_SMBIOS_BASEBOARD_MANUFACTURER] = 'm',
[CHID_SMBIOS_BASEBOARD_PRODUCT] = 'p',
[CHID_SMBIOS_BIOS_VENDOR] = 'B',
[CHID_SMBIOS_BIOS_VERSION] = 'v',
[CHID_SMBIOS_BIOS_MAJOR] = 'R',
[CHID_SMBIOS_BIOS_MINOR] = 'r',
[CHID_SMBIOS_ENCLOSURE_TYPE] = 'e',
};
static char *chid_smbios_fields_string(uint32_t combination) {
_cleanup_free_ char *s = NULL;
for (ChidSmbiosFields f = 0; f < _CHID_SMBIOS_FIELDS_MAX; f++) {
char c;
c = (combination & (UINT32_C(1) << f)) ? chid_smbios_fields_char[f] : '-';
if (!strextend(&s, CHAR_TO_STR(c)))
return NULL;
}
return TAKE_PTR(s);
}
static int add_chid(Table *table, const EFI_GUID guids[static CHID_TYPES_MAX], size_t t) {
int r;
assert(table);
assert(guids);
assert(t < CHID_TYPES_MAX);
sd_id128_t id = efi_guid_to_id128(guids + t);
if (sd_id128_is_null(id))
return 0;
_cleanup_free_ char *flags = chid_smbios_fields_string(chid_smbios_table[t]);
if (!flags)
return log_oom();
r = table_add_many(table,
TABLE_UINT, (unsigned) t,
TABLE_STRING, flags,
TABLE_UUID, id);
if (r < 0)
return table_log_add_error(r);
return 0;
}
static void smbios_fields_free(char16_t *(*fields)[_CHID_SMBIOS_FIELDS_MAX]) {
assert(fields);
FOREACH_ARRAY(i, *fields, _CHID_SMBIOS_FIELDS_MAX)
free(*i);
}
static int smbios_fields_acquire(char16_t *fields[static _CHID_SMBIOS_FIELDS_MAX]) {
static const char *const smbios_files[_CHID_SMBIOS_FIELDS_MAX] = {
[CHID_SMBIOS_MANUFACTURER] = "sys_vendor",
[CHID_SMBIOS_FAMILY] = "product_family",
[CHID_SMBIOS_PRODUCT_NAME] = "product_name",
[CHID_SMBIOS_PRODUCT_SKU] = "product_sku",
[CHID_SMBIOS_BASEBOARD_MANUFACTURER] = "board_vendor",
[CHID_SMBIOS_BASEBOARD_PRODUCT] = "board_name",
[CHID_SMBIOS_BIOS_VENDOR] = "bios_vendor",
[CHID_SMBIOS_BIOS_VERSION] = "bios_version",
[CHID_SMBIOS_BIOS_MAJOR] = "bios_release",
[CHID_SMBIOS_BIOS_MINOR] = "bios_release",
[CHID_SMBIOS_ENCLOSURE_TYPE] = "chassis_type",
};
int r;
_cleanup_close_ int smbios_fd = open("/sys/class/dmi/id", O_RDONLY|O_DIRECTORY|O_CLOEXEC);
if (smbios_fd < 0)
return log_error_errno(errno, "Failed to open SMBIOS sysfs object: %m");
for (ChidSmbiosFields f = 0; f < _CHID_SMBIOS_FIELDS_MAX; f++) {
_cleanup_free_ char *buf = NULL;
size_t size;
/* According to the CHID spec we should not generate CHIDs for SMBIOS fields that aren't set
* or are set to an empty string. Hence leave them NULL here. */
if (!smbios_files[f])
continue;
r = read_virtual_file_at(smbios_fd, smbios_files[f], SIZE_MAX, &buf, &size);
if (r == -ENOENT) {
log_debug_errno(r, "SMBIOS field '%s' not set, skipping.", smbios_files[f]);
continue;
}
if (r < 0)
return log_error_errno(r, "Failed to read SMBIOS field '%s': %m", smbios_files[f]);
if (size == 0 || (size == 1 && buf[0] == '\n')) {
log_debug("SMBIOS field '%s' is empty, skipping.", smbios_files[f]);
continue;
}
if (buf[size-1] != '\n')
return log_error_errno(SYNTHETIC_ERRNO(EINVAL), "Expected SMBIOS field '%s' to end in newline, but it doesn't, refusing.", smbios_files[f]);
buf[size-1] = 0;
size--;
switch (f) {
case CHID_SMBIOS_BIOS_MAJOR:
case CHID_SMBIOS_BIOS_MINOR: {
/* The kernel exposes this a string <major>.<minor>, split them apart again. */
char *dot = memchr(buf, '.', size);
if (!dot)
return log_error_errno(SYNTHETIC_ERRNO(EINVAL), "BIOS release field '%s' contains no dot?", smbios_files[f]);
const char *p;
if (f == CHID_SMBIOS_BIOS_MAJOR) {
*dot = 0;
p = buf;
} else {
assert(f == CHID_SMBIOS_BIOS_MINOR);
p = dot + 1;
}
/* The kernel exports the enclosure in decimal, we need it in hex (zero left-padded) */
uint8_t u;
r = safe_atou8(p, &u);
if (r < 0)
return log_error_errno(r, "Failed to parse BIOS release: %s", p);
buf = mfree(buf);
if (asprintf(&buf, "%02x", u) < 0)
return log_oom();
size = strlen(buf);
break;
}
case CHID_SMBIOS_ENCLOSURE_TYPE: {
/* The kernel exports the enclosure in decimal, we need it in hex (no padding!) */
uint8_t u;
r = safe_atou8(buf, &u);
if (r < 0)
return log_error_errno(r, "Failed to parse enclosure type: %s", buf);
buf = mfree(buf);
if (u == 0)
buf = strdup(""); /* zero is mapped to empty string */
else
(void) asprintf(&buf, "%x", u);
if (!buf)
return log_oom();
size = strlen(buf);
break;
}
default:
break;
}
fields[f] = utf8_to_utf16(buf, size);
if (!fields[f])
return log_oom();
}
return 0;
}
int verb_chid(int argc, char *argv[], void *userdata) {
_cleanup_(table_unrefp) Table *table = NULL;
int r;
if (detect_container() > 0)
return log_error_errno(SYNTHETIC_ERRNO(ENOTRECOVERABLE), "Container environments do not have SMBIOS.");
table = table_new("type", "input", "chid");
if (!table)
return log_oom();
(void) table_set_align_percent(table, table_get_cell(table, 0, 0), 100);
(void) table_set_align_percent(table, table_get_cell(table, 0, 1), 50);
_cleanup_(smbios_fields_free) char16_t* smbios_fields[_CHID_SMBIOS_FIELDS_MAX] = {};
r = smbios_fields_acquire(smbios_fields);
if (r < 0)
return r;
EFI_GUID chids[CHID_TYPES_MAX] = {};
chid_calculate((const char16_t* const*) smbios_fields, chids);
if (strv_isempty(strv_skip(argv, 1)))
for (size_t t = 0; t < CHID_TYPES_MAX; t++) {
r = add_chid(table, chids, t);
if (r < 0)
return r;
}
else {
STRV_FOREACH(as, strv_skip(argv, 1)) {
size_t t;
r = parse_chid_type(*as, &t);
if (r < 0)
return log_error_errno(r, "Failed to pare CHID type: %s", *as);
r = add_chid(table, chids, t);
if (r < 0)
return r;
}
(void) table_set_sort(table, (size_t) 0);
}
r = table_print_with_pager(table, arg_json_format_flags, arg_pager_flags, arg_legend);
if (r < 0)
return log_error_errno(r, "Failed to output table: %m");
if (!sd_json_format_enabled(arg_json_format_flags)) {
_cleanup_free_ char *legend = NULL;
bool separator = false;
size_t w = 0;
legend = strjoin(ansi_grey(), "LEGEND: ", ansi_normal());
if (!legend)
return log_oom();
for (ChidSmbiosFields f = 0; f < _CHID_SMBIOS_FIELDS_MAX; f++) {
_cleanup_free_ char *c = NULL;
if (smbios_fields[f]) {
_cleanup_free_ char *u = NULL;
u = utf16_to_utf8(smbios_fields[f], SIZE_MAX);
if (!u)
return log_oom();
c = cescape(u);
if (!c)
return log_oom();
}
if (!strextend(&legend,
ansi_grey(),
separator ? " " : "",
separator ? special_glyph(SPECIAL_GLYPH_HORIZONTAL_DOTTED) : "",
separator ? " " : "",
ansi_normal(),
CHAR_TO_STR(chid_smbios_fields_char[f]),
ansi_grey(),
" ",
special_glyph(SPECIAL_GLYPH_ARROW_RIGHT),
" ",
ansi_normal(),
chid_smbios_friendly[f],
ansi_grey(),
" (",
c ? ansi_highlight() : ansi_grey(),
strna(c),
ansi_grey(),
")",
ansi_normal()))
return log_oom();
w += separator * 3 +
4 +
utf8_console_width(chid_smbios_friendly[f]) +
2 +
utf8_console_width(strna(c)) +
1;
if (w > 79) {
if (!strextend(&legend, "\n "))
return log_oom();
separator = false;
w = 8;
} else
separator = true;
}
putchar('\n');
puts(legend);
}
return EXIT_SUCCESS;
}

View File

@ -0,0 +1,4 @@
/* SPDX-License-Identifier: LGPL-2.1-or-later */
#pragma once
int verb_chid(int argc, char *argv[], void *userdata);

View File

@ -18,6 +18,7 @@
#include "analyze-calendar.h"
#include "analyze-capability.h"
#include "analyze-cat-config.h"
#include "analyze-chid.h"
#include "analyze-compare-versions.h"
#include "analyze-condition.h"
#include "analyze-critical-chain.h"
@ -219,6 +220,7 @@ static int help(int argc, char *argv[], void *userdata) {
" filesystems [NAME...] List known filesystems\n"
" architectures [NAME...] List known architectures\n"
" smbios11 List strings passed via SMBIOS Type #11\n"
" chid List local CHIDs\n"
"\n%3$sExpression Evaluation:%4$s\n"
" condition CONDITION... Evaluate conditions and asserts\n"
" compare-versions VERSION1 [OP] VERSION2\n"
@ -593,10 +595,6 @@ static int parse_argv(int argc, char *argv[]) {
return log_error_errno(SYNTHETIC_ERRNO(EINVAL),
"Option --offline= requires one or more units to perform a security review.");
if (sd_json_format_enabled(arg_json_format_flags) && !STRPTR_IN_SET(argv[optind], "security", "inspect-elf", "plot", "fdstore", "pcrs", "architectures", "capability", "exit-status"))
return log_error_errno(SYNTHETIC_ERRNO(EINVAL),
"Option --json= is only supported for security, inspect-elf, plot, fdstore, pcrs, architectures, capability, exit-status right now.");
if (arg_threshold != 100 && !streq_ptr(argv[optind], "security"))
return log_error_errno(SYNTHETIC_ERRNO(EINVAL),
"Option --threshold= is only supported for security right now.");
@ -631,10 +629,6 @@ static int parse_argv(int argc, char *argv[]) {
if (streq_ptr(argv[optind], "condition") && arg_unit && optind < argc - 1)
return log_error_errno(SYNTHETIC_ERRNO(EINVAL), "No conditions can be passed if --unit= is used.");
if ((!arg_legend && !STRPTR_IN_SET(argv[optind], "plot", "architectures")) ||
(streq_ptr(argv[optind], "plot") && !arg_legend && !arg_table && !sd_json_format_enabled(arg_json_format_flags)))
return log_error_errno(SYNTHETIC_ERRNO(EINVAL), "Option --no-legend is only supported for plot with either --table or --json=.");
if (arg_table && !streq_ptr(argv[optind], "plot"))
return log_error_errno(SYNTHETIC_ERRNO(EINVAL), "Option --table is only supported for plot right now.");
@ -691,6 +685,7 @@ static int run(int argc, char *argv[]) {
{ "srk", VERB_ANY, 1, 0, verb_srk },
{ "architectures", VERB_ANY, VERB_ANY, 0, verb_architectures },
{ "smbios11", VERB_ANY, 1, 0, verb_smbios11 },
{ "chid", VERB_ANY, VERB_ANY, 0, verb_chid },
{}
};

View File

@ -6,6 +6,7 @@ systemd_analyze_sources = files(
'analyze-calendar.c',
'analyze-capability.c',
'analyze-cat-config.c',
'analyze-chid.c',
'analyze-compare-versions.c',
'analyze-condition.c',
'analyze-critical-chain.c',

View File

@ -9,8 +9,8 @@
#define AUDIT_SESSION_INVALID UINT32_MAX
int audit_session_from_pid(const PidRef *pid, uint32_t *id);
int audit_loginuid_from_pid(const PidRef *pid, uid_t *uid);
int audit_session_from_pid(const PidRef *pid, uint32_t *ret_id);
int audit_loginuid_from_pid(const PidRef *pid, uid_t *ret_uid);
bool use_audit(void);

View File

@ -8,8 +8,9 @@
#include <unistd.h>
#include "alloc-util.h"
#include "capability-util.h"
#include "cap-list.h"
#include "capability-util.h"
#include "fd-util.h"
#include "fileio.h"
#include "log.h"
#include "logarithm.h"
@ -17,6 +18,8 @@
#include "missing_prctl.h"
#include "missing_threads.h"
#include "parse-util.h"
#include "pidref.h"
#include "stat-util.h"
#include "user-util.h"
int have_effective_cap(int value) {
@ -607,3 +610,78 @@ int capability_get_ambient(uint64_t *ret) {
*ret = a;
return 1;
}
int pidref_get_capability(const PidRef *pidref, CapabilityQuintet *ret) {
int r;
if (!pidref_is_set(pidref))
return -ESRCH;
if (pidref_is_remote(pidref))
return -EREMOTE;
const char *path = procfs_file_alloca(pidref->pid, "status");
_cleanup_fclose_ FILE *f = fopen(path, "re");
if (!f) {
if (errno == ENOENT && proc_mounted() == 0)
return -ENOSYS;
return -errno;
}
CapabilityQuintet q = CAPABILITY_QUINTET_NULL;
for (;;) {
_cleanup_free_ char *line = NULL;
r = read_line(f, LONG_LINE_MAX, &line);
if (r < 0)
return r;
if (r == 0)
break;
static const struct {
const char *field;
size_t offset;
} fields[] = {
{ "CapBnd:", offsetof(CapabilityQuintet, bounding) },
{ "CapInh:", offsetof(CapabilityQuintet, inheritable) },
{ "CapPrm:", offsetof(CapabilityQuintet, permitted) },
{ "CapEff:", offsetof(CapabilityQuintet, effective) },
{ "CapAmb:", offsetof(CapabilityQuintet, ambient) },
};
FOREACH_ELEMENT(i, fields) {
const char *p = first_word(line, i->field);
if (!p)
continue;
uint64_t *v = (uint64_t*) ((uint8_t*) &q + i->offset);
if (*v != CAP_MASK_UNSET)
return -EBADMSG;
r = safe_atoux64(p, v);
if (r < 0)
return r;
if (*v == CAP_MASK_UNSET)
return -EBADMSG;
}
}
if (q.effective == CAP_MASK_UNSET ||
q.inheritable == CAP_MASK_UNSET ||
q.permitted == CAP_MASK_UNSET ||
q.effective == CAP_MASK_UNSET ||
q.ambient == CAP_MASK_UNSET)
return -EBADMSG;
r = pidref_verify(pidref);
if (r < 0)
return r;
if (ret)
*ret = q;
return 0;
}

View File

@ -8,6 +8,7 @@
#include "macro.h"
#include "missing_capability.h"
#include "pidref.h"
/* Special marker used when storing a capabilities mask as "unset" */
#define CAP_MASK_UNSET UINT64_MAX
@ -66,14 +67,18 @@ typedef struct CapabilityQuintet {
assert_cc(CAP_LAST_CAP < 64);
#define CAPABILITY_QUINTET_NULL { CAP_MASK_UNSET, CAP_MASK_UNSET, CAP_MASK_UNSET, CAP_MASK_UNSET, CAP_MASK_UNSET }
#define CAPABILITY_QUINTET_NULL (CapabilityQuintet) { CAP_MASK_UNSET, CAP_MASK_UNSET, CAP_MASK_UNSET, CAP_MASK_UNSET, CAP_MASK_UNSET }
static inline bool capability_is_set(uint64_t v) {
return v != CAP_MASK_UNSET;
}
static inline bool capability_quintet_is_set(const CapabilityQuintet *q) {
return q->effective != CAP_MASK_UNSET ||
q->bounding != CAP_MASK_UNSET ||
q->inheritable != CAP_MASK_UNSET ||
q->permitted != CAP_MASK_UNSET ||
q->ambient != CAP_MASK_UNSET;
return capability_is_set(q->effective) ||
capability_is_set(q->bounding) ||
capability_is_set(q->inheritable) ||
capability_is_set(q->permitted) ||
capability_is_set(q->ambient);
}
/* Mangles the specified caps quintet taking the current bounding set into account:
@ -84,3 +89,5 @@ bool capability_quintet_mangle(CapabilityQuintet *q);
int capability_quintet_enforce(const CapabilityQuintet *q);
int capability_get_ambient(uint64_t *ret);
int pidref_get_capability(const PidRef *pidref, CapabilityQuintet *ret);

View File

@ -744,10 +744,15 @@ int chase_extract_filename(const char *path, const char *root, char **ret) {
return strdup_to(ret, ".");
}
int chase_and_open(const char *path, const char *root, ChaseFlags chase_flags, int open_flags, char **ret_path) {
int chase_and_open(
const char *path,
const char *root,
ChaseFlags chase_flags,
int open_flags,
char **ret_path) {
_cleanup_close_ int path_fd = -EBADF;
_cleanup_free_ char *p = NULL, *fname = NULL;
mode_t mode = open_flags & O_DIRECTORY ? 0755 : 0644;
int r;
assert(!(chase_flags & (CHASE_NONEXISTENT|CHASE_STEP)));
@ -758,7 +763,7 @@ int chase_and_open(const char *path, const char *root, ChaseFlags chase_flags, i
return xopenat_full(AT_FDCWD, path,
open_flags | (FLAGS_SET(chase_flags, CHASE_NOFOLLOW) ? O_NOFOLLOW : 0),
/* xopen_flags = */ 0,
mode);
MODE_INVALID);
r = chase(path, root, CHASE_PARENT|chase_flags, &p, &path_fd);
if (r < 0)
@ -772,7 +777,7 @@ int chase_and_open(const char *path, const char *root, ChaseFlags chase_flags, i
return r;
}
r = xopenat_full(path_fd, strempty(fname), open_flags|O_NOFOLLOW, /* xopen_flags = */ 0, mode);
r = xopenat_full(path_fd, strempty(fname), open_flags|O_NOFOLLOW, /* xopen_flags = */ 0, MODE_INVALID);
if (r < 0)
return r;
@ -948,10 +953,15 @@ int chase_and_open_parent(const char *path, const char *root, ChaseFlags chase_f
return pfd;
}
int chase_and_openat(int dir_fd, const char *path, ChaseFlags chase_flags, int open_flags, char **ret_path) {
int chase_and_openat(
int dir_fd,
const char *path,
ChaseFlags chase_flags,
int open_flags,
char **ret_path) {
_cleanup_close_ int path_fd = -EBADF;
_cleanup_free_ char *p = NULL, *fname = NULL;
mode_t mode = open_flags & O_DIRECTORY ? 0755 : 0644;
int r;
assert(!(chase_flags & (CHASE_NONEXISTENT|CHASE_STEP)));
@ -962,7 +972,7 @@ int chase_and_openat(int dir_fd, const char *path, ChaseFlags chase_flags, int o
return xopenat_full(dir_fd, path,
open_flags | (FLAGS_SET(chase_flags, CHASE_NOFOLLOW) ? O_NOFOLLOW : 0),
/* xopen_flags = */ 0,
mode);
MODE_INVALID);
r = chaseat(dir_fd, path, chase_flags|CHASE_PARENT, &p, &path_fd);
if (r < 0)
@ -974,7 +984,7 @@ int chase_and_openat(int dir_fd, const char *path, ChaseFlags chase_flags, int o
return r;
}
r = xopenat_full(path_fd, strempty(fname), open_flags|O_NOFOLLOW, /* xopen_flags = */ 0, mode);
r = xopenat_full(path_fd, strempty(fname), open_flags|O_NOFOLLOW, /* xopen_flags= */ 0, MODE_INVALID);
if (r < 0)
return r;

View File

@ -1133,10 +1133,17 @@ int xopenat_full(int dir_fd, const char *path, int open_flags, XOpenFlags xopen_
* If the path is specified NULL or empty, behaves like fd_reopen().
*
* If XO_NOCOW is specified will turn on the NOCOW btrfs flag on the file, if available.
*
* If mode is specified as MODE_INVALID, we'll use 0755 for dirs, and 0644 for regular files.
*/
if (mode == MODE_INVALID)
mode = (open_flags & O_DIRECTORY) ? 0755 : 0644;
if (isempty(path)) {
assert(!FLAGS_SET(open_flags, O_CREAT|O_EXCL));
if (FLAGS_SET(open_flags, O_CREAT|O_EXCL))
return -EEXIST;
return fd_reopen(dir_fd, open_flags & ~O_NOFOLLOW);
}

View File

@ -95,10 +95,18 @@ assert_cc(MS_LAZYTIME == (1<<25));
#endif
/* linux/nsfs.h */
#ifndef NS_GET_NSTYPE /* d95fa3c76a66b6d76b1e109ea505c55e66360f3c (4.11) */
#ifndef NS_GET_USERNS /* 6786741dbf99e44fb0c0ed85a37582b8a26f1c3b (4.9) */
#define NS_GET_USERNS _IO(0xb7, 0x1)
#endif
#ifndef NS_GET_NSTYPE /* e5ff5ce6e20ee22511398bb31fb912466cf82a36 (4.11) */
#define NS_GET_NSTYPE _IO(0xb7, 0x3)
#endif
#ifndef NS_GET_OWNER_UID /* d95fa3c76a66b6d76b1e109ea505c55e66360f3c (4.11) */
#define NS_GET_OWNER_UID _IO(0xb7, 0x4)
#endif
#ifndef FS_PROJINHERIT_FL
# define FS_PROJINHERIT_FL 0x20000000
#else

View File

@ -585,3 +585,174 @@ int is_idmapping_supported(const char *path) {
return true;
}
static int uid_map_search_root(pid_t pid, const char *filename, uid_t *ret) {
int r;
assert(pid_is_valid(pid));
assert(filename);
assert(ret);
const char *p = procfs_file_alloca(pid, filename);
_cleanup_fclose_ FILE *f = fopen(p, "re");
if (!f) {
if (errno != ENOENT)
return -errno;
return proc_mounted() > 0 ? -EOPNOTSUPP : -ENOSYS;
}
for (;;) {
uid_t uid_base = UID_INVALID, uid_shift = UID_INVALID, uid_range = UID_INVALID;
errno = 0;
r = fscanf(f, UID_FMT " " UID_FMT " " UID_FMT "\n", &uid_base, &uid_shift, &uid_range);
if (r == EOF)
return errno_or_else(ENOMSG);
assert(r >= 0);
if (r != 3)
return -EBADMSG;
if (uid_range <= 0)
return -EBADMSG;
if (uid_base == 0)
*ret = uid_shift;
return 0;
}
}
int userns_get_base_uid(int userns_fd, uid_t *ret_uid, gid_t *ret_gid) {
_cleanup_(close_pairp) int pfd[2] = EBADF_PAIR;
_cleanup_(sigkill_waitp) pid_t pid = 0;
ssize_t n;
char x;
int r;
assert(userns_fd >= 0);
if (pipe2(pfd, O_CLOEXEC) < 0)
return -errno;
r = safe_fork_full(
"(sd-baseuns)",
/* stdio_fds= */ NULL,
(int[]) { pfd[1], userns_fd }, 2,
FORK_CLOSE_ALL_FDS|FORK_DEATHSIG_SIGKILL,
&pid);
if (r < 0)
return r;
if (r == 0) {
/* Child. */
if (setns(userns_fd, CLONE_NEWUSER) < 0) {
log_debug_errno(errno, "Failed to join userns: %m");
_exit(EXIT_FAILURE);
}
userns_fd = safe_close(userns_fd);
n = write(pfd[1], &(const char) { 'x' }, 1);
if (n < 0) {
log_debug_errno(errno, "Failed to write to fifo: %m");
_exit(EXIT_FAILURE);
}
assert(n == 1);
freeze();
}
pfd[1] = safe_close(pfd[1]);
n = read(pfd[0], &x, 1);
if (n < 0)
return -errno;
if (n == 0)
return -EPROTO;
assert(n == 1);
assert(x == 'x');
uid_t uid;
r = uid_map_search_root(pid, "uid_map", &uid);
if (r < 0)
return r;
gid_t gid;
r = uid_map_search_root(pid, "gid_map", &gid);
if (r < 0)
return r;
if (!ret_gid && uid != gid)
return -ENOMSG;
if (ret_uid)
*ret_uid = uid;
if (ret_gid)
*ret_gid = gid;
return 0;
}
int process_is_owned_by_uid(PidRef *pidref, uid_t uid) {
int r;
/* Checks if the specified process either is owned directly by the specified user, or if it is inside
* a user namespace owned by it. */
uid_t process_uid;
r = pidref_get_uid(pidref, &process_uid);
if (r < 0)
return r;
if (process_uid == uid)
return true;
_cleanup_close_ int userns_fd = -EBADF;
r = pidref_namespace_open(
pidref,
/* ret_pidns_fd= */ NULL,
/* ret_mntns_fd= */ NULL,
/* ret_netns_fd= */ NULL,
&userns_fd,
/* ret_root_fd= */ NULL);
if (ERRNO_IS_NOT_SUPPORTED(r)) /* If userns is not supported, then they don't matter for ownership */
return false;
if (r < 0)
return r;
for (unsigned iteration = 0;; iteration++) {
uid_t ns_uid;
/* This process is in our own userns? Then we are done, in our own userns only the UIDs
* themselves matter. */
r = is_our_namespace(userns_fd, NAMESPACE_USER);
if (r < 0)
return r;
if (r > 0)
return false;
if (ioctl(userns_fd, NS_GET_OWNER_UID, &ns_uid) < 0) {
/* kernel too old? then we cannot make the determination, let's hence say no */
if (ERRNO_IS_NOT_SUPPORTED(errno))
return false;
return -errno;
}
if (ns_uid == uid)
return true;
/* Paranoia check */
if (iteration > 16)
return log_debug_errno(SYNTHETIC_ERRNO(ENOTRECOVERABLE), "Giving up while tracing parents of user namespaces after %u steps.", iteration);
/* Go up the tree */
_cleanup_close_ int parent_fd = ioctl(userns_fd, NS_GET_USERNS);
if (parent_fd < 0) {
if (errno == EPERM) /* EPERM means we left our own userns */
return false;
return -errno;
}
close_and_replace(userns_fd, parent_fd);
}
}

View File

@ -80,3 +80,7 @@ int namespace_is_init(NamespaceType type);
int is_our_namespace(int fd, NamespaceType type);
int is_idmapping_supported(const char *path);
int userns_get_base_uid(int userns_fd, uid_t *ret_uid, gid_t *ret_gid);
int process_is_owned_by_uid(PidRef *pidref, uid_t uid);

View File

@ -500,22 +500,6 @@ int pidref_is_kernel_thread(const PidRef *pid) {
return result;
}
int get_process_capeff(pid_t pid, char **ret) {
const char *p;
int r;
assert(pid >= 0);
assert(ret);
p = procfs_file_alloca(pid, "status");
r = get_proc_field(p, "CapEff", WHITESPACE, ret);
if (r == -ENOENT)
return -ESRCH;
return r;
}
static int get_process_link_contents(pid_t pid, const char *proc_file, char **ret) {
const char *p;
int r;

View File

@ -50,7 +50,6 @@ int get_process_exe(pid_t pid, char **ret);
int pid_get_uid(pid_t pid, uid_t *ret);
int pidref_get_uid(const PidRef *pid, uid_t *ret);
int get_process_gid(pid_t pid, gid_t *ret);
int get_process_capeff(pid_t pid, char **ret);
int get_process_cwd(pid_t pid, char **ret);
int get_process_root(pid_t pid, char **ret);
int get_process_environ(pid_t pid, char **ret);

View File

@ -127,5 +127,5 @@ bool uid_for_system_journal(uid_t uid) {
/* Returns true if the specified UID shall get its data stored in the system journal. */
return uid_is_system(uid) || uid_is_dynamic(uid) || uid == UID_NOBODY || uid_is_container(uid);
return uid_is_system(uid) || uid_is_dynamic(uid) || uid == UID_NOBODY || uid_is_container(uid) || uid_is_foreign(uid);
}

View File

@ -12,6 +12,10 @@ assert_cc((CONTAINER_UID_BASE_MAX & 0xFFFFU) == 0);
#define CONTAINER_UID_MIN (CONTAINER_UID_BASE_MIN)
#define CONTAINER_UID_MAX (CONTAINER_UID_BASE_MAX + 0xFFFFU)
assert_cc((FOREIGN_UID_BASE & 0xFFFFU) == 0);
#define FOREIGN_UID_MIN (FOREIGN_UID_BASE)
#define FOREIGN_UID_MAX (FOREIGN_UID_BASE + 0xFFFFU)
bool uid_is_system(uid_t uid);
bool gid_is_system(gid_t gid);
@ -31,6 +35,14 @@ static inline bool gid_is_container(gid_t gid) {
return uid_is_container((uid_t) gid);
}
static inline bool uid_is_foreign(uid_t uid) {
return FOREIGN_UID_MIN <= uid && uid <= FOREIGN_UID_MAX;
}
static inline bool gid_is_foreign(gid_t gid) {
return uid_is_foreign((uid_t) gid);
}
typedef struct UGIDAllocationRange {
uid_t system_alloc_uid_min;
uid_t system_uid_max;

View File

@ -3126,6 +3126,49 @@ int unit_attach_pids_to_cgroup(Unit *u, Set *pids, const char *suffix_path) {
return ret;
}
int unit_remove_subcgroup(Unit *u, const char *suffix_path) {
int r;
assert(u);
if (!UNIT_HAS_CGROUP_CONTEXT(u))
return -EINVAL;
if (!unit_cgroup_delegate(u))
return -ENOMEDIUM;
r = unit_pick_cgroup_path(u);
if (r < 0)
return r;
CGroupRuntime *crt = unit_get_cgroup_runtime(u);
if (!crt || !crt->cgroup_path)
return -EOWNERDEAD;
_cleanup_free_ char *j = NULL;
bool delete_root;
const char *d;
if (empty_or_root(suffix_path)) {
d = empty_to_root(crt->cgroup_path);
delete_root = false; /* Don't attempt to delete the main cgroup of this unit */
} else {
j = path_join(crt->cgroup_path, suffix_path);
if (!j)
return -ENOMEM;
d = j;
delete_root = true;
}
log_unit_debug(u, "Removing subcgroup '%s'...", d);
r = cg_trim_everywhere(u->manager->cgroup_supported, d, delete_root);
if (r < 0)
return log_unit_debug_errno(u, r, "Failed to fully %s cgroup '%s': %m", delete_root ? "remove" : "trim", d);
return 0;
}
static bool unit_has_mask_realized(
Unit *u,
CGroupMask target_mask,
@ -3567,6 +3610,49 @@ static bool unit_maybe_release_cgroup(Unit *u) {
return false;
}
static int unit_prune_cgroup_via_bus(Unit *u) {
_cleanup_(sd_bus_error_free) sd_bus_error error = SD_BUS_ERROR_NULL;
int r;
assert(u);
if (MANAGER_IS_SYSTEM(u->manager))
return -EINVAL;
if (!u->manager->system_bus)
return -EIO;
CGroupRuntime *crt = unit_get_cgroup_runtime(u);
if (!crt || !crt->cgroup_path)
return -EOWNERDEAD;
/* Determine this unit's cgroup path relative to our cgroup root */
const char *pp = path_startswith(crt->cgroup_path, u->manager->cgroup_root);
if (!pp)
return -EINVAL;
_cleanup_free_ char *absolute = NULL;
if (!path_is_absolute(pp)) { /* RemoveSubgroupFromUnit() wants an absolute path */
absolute = strjoin("/", pp);
if (!absolute)
return -ENOMEM;
pp = absolute;
}
r = bus_call_method(u->manager->system_bus,
bus_systemd_mgr,
"RemoveSubgroupFromUnit",
&error, NULL,
"ss",
NULL /* empty unit name means client's unit, i.e. us */,
pp);
if (r < 0)
return log_unit_debug_errno(u, r, "Failed to trim cgroup via the bus: %s", bus_error_message(&error, r));
return 0;
}
void unit_prune_cgroup(Unit *u) {
bool is_root_slice;
int r;
@ -3598,13 +3684,21 @@ void unit_prune_cgroup(Unit *u) {
is_root_slice = unit_has_name(u, SPECIAL_ROOT_SLICE);
r = cg_trim_everywhere(u->manager->cgroup_supported, crt->cgroup_path, !is_root_slice);
if (r < 0)
if (r < 0) {
int k = unit_prune_cgroup_via_bus(u);
if (k >= 0)
log_unit_debug_errno(u, r, "Failed to destroy cgroup %s on our own (%m), but worked when talking to PID 1.", empty_to_root(crt->cgroup_path));
else {
/* One reason we could have failed here is, that the cgroup still contains a process.
* However, if the cgroup becomes removable at a later time, it might be removed when
* the containing slice is stopped. So even if we failed now, this unit shouldn't assume
* that the cgroup is still realized the next time it is started. Do not return early
* on error, continue cleanup. */
log_unit_full_errno(u, r == -EBUSY ? LOG_DEBUG : LOG_WARNING, r, "Failed to destroy cgroup %s, ignoring: %m", empty_to_root(crt->cgroup_path));
* the containing slice is stopped. So even if we failed now, this unit shouldn't
* assume that the cgroup is still realized the next time it is started. Do not
* return early on error, continue cleanup. */
log_unit_full_errno(u, r == -EBUSY ? LOG_DEBUG : LOG_WARNING, r,
"Failed to destroy cgroup %s, ignoring: %m", empty_to_root(crt->cgroup_path));
}
}
if (is_root_slice)
return;

View File

@ -456,6 +456,7 @@ int unit_check_oomd_kill(Unit *u);
int unit_check_oom(Unit *u);
int unit_attach_pids_to_cgroup(Unit *u, Set *pids, const char *suffix_path);
int unit_remove_subcgroup(Unit *u, const char *suffix_path);
int manager_setup_cgroup(Manager *m);
void manager_shutdown_cgroup(Manager *m, bool delete);

View File

@ -959,6 +959,12 @@ static int method_attach_processes_to_unit(sd_bus_message *message, void *userda
return method_generic_unit_operation(message, userdata, error, bus_unit_method_attach_processes, GENERIC_UNIT_VALIDATE_LOADED);
}
static int method_remove_subgroup_from_unit(sd_bus_message *message, void *userdata, sd_bus_error *error) {
/* Don't allow removal of subgroups from units that aren't loaded. But allow loading the unit, since
* this is clean-up work, that is OK to do when the unit is stopped already. */
return method_generic_unit_operation(message, userdata, error, bus_unit_method_remove_subgroup, GENERIC_UNIT_LOAD|GENERIC_UNIT_VALIDATE_LOADED);
}
static int transient_unit_from_message(
Manager *m,
sd_bus_message *message,
@ -3245,6 +3251,11 @@ const sd_bus_vtable bus_manager_vtable[] = {
SD_BUS_NO_RESULT,
method_attach_processes_to_unit,
SD_BUS_VTABLE_UNPRIVILEGED),
SD_BUS_METHOD_WITH_ARGS("RemoveSubgroupFromUnit",
SD_BUS_ARGS("s", unit_name, "s", subcgroup),
SD_BUS_NO_RESULT,
method_remove_subgroup_from_unit,
SD_BUS_VTABLE_UNPRIVILEGED),
SD_BUS_METHOD_WITH_ARGS("AbandonScope",
SD_BUS_ARGS("s", name),
SD_BUS_NO_RESULT,

View File

@ -1528,7 +1528,7 @@ int bus_unit_method_attach_processes(sd_bus_message *message, void *userdata, sd
return r;
for (;;) {
_cleanup_(pidref_freep) PidRef *pidref = NULL;
uid_t process_uid, sender_uid;
uid_t sender_uid;
uint32_t upid;
r = sd_bus_message_read(message, "u", &upid);
@ -1563,16 +1563,19 @@ int bus_unit_method_attach_processes(sd_bus_message *message, void *userdata, sd
if (r < 0)
return r;
/* Let's validate security: if the sender is root, then all is OK. If the sender is any other unit,
* then the process' UID and the target unit's UID have to match the sender's UID */
/* Let's validate security: if the sender is root, then all is OK. If the sender is any other
* user, then the process' UID and the target unit's UID have to match the sender's UID */
if (sender_uid != 0 && sender_uid != getuid()) {
r = pidref_get_uid(pidref, &process_uid);
r = process_is_owned_by_uid(pidref, sender_uid);
if (r < 0)
return sd_bus_error_set_errnof(error, r, "Failed to retrieve process UID: %m");
if (process_uid != sender_uid)
return sd_bus_error_set_errnof(error, r, "Failed to check if process " PID_FMT " is owned by client's UID: %m", pidref->pid);
if (r == 0)
return sd_bus_error_setf(error, SD_BUS_ERROR_ACCESS_DENIED, "Process " PID_FMT " not owned by client's UID. Refusing.", pidref->pid);
if (process_uid != u->ref_uid)
r = process_is_owned_by_uid(pidref, u->ref_uid);
if (r < 0)
return sd_bus_error_set_errnof(error, r, "Failed to check if process " PID_FMT " is owned by target unit's UID: %m", pidref->pid);
if (r == 0)
return sd_bus_error_setf(error, SD_BUS_ERROR_ACCESS_DENIED, "Process " PID_FMT " not owned by target unit's UID. Refusing.", pidref->pid);
}
@ -1592,6 +1595,54 @@ int bus_unit_method_attach_processes(sd_bus_message *message, void *userdata, sd
return sd_bus_reply_method_return(message, NULL);
}
int bus_unit_method_remove_subgroup(sd_bus_message *message, void *userdata, sd_bus_error *error) {
Unit *u = ASSERT_PTR(userdata);
int r;
assert(message);
/* This removes a subcgroup of the unit, regardless which user owns the subcgroup. This is useful
* when cgroup delegation is enabled for a unit, and the unit subdelegates the cgroup further */
r = mac_selinux_unit_access_check(u, message, "stop", error);
if (r < 0)
return r;
const char *path;
r = sd_bus_message_read(message, "s", &path);
if (r < 0)
return r;
if (!unit_cgroup_delegate(u))
return sd_bus_error_set(error, SD_BUS_ERROR_INVALID_ARGS, "Subcgroup removal not available on non-delegated units.");
if (!path_is_absolute(path))
return sd_bus_error_setf(error, SD_BUS_ERROR_INVALID_ARGS, "Control group path is not absolute: %s", path);
if (!path_is_normalized(path))
return sd_bus_error_setf(error, SD_BUS_ERROR_INVALID_ARGS, "Control group path is not normalized: %s", path);
_cleanup_(sd_bus_creds_unrefp) sd_bus_creds *creds = NULL;
r = sd_bus_query_sender_creds(message, SD_BUS_CREDS_EUID, &creds);
if (r < 0)
return r;
uid_t sender_uid;
r = sd_bus_creds_get_euid(creds, &sender_uid);
if (r < 0)
return r;
/* Allow this only if the client is privileged, is us, or is the user of the unit itself. */
if (sender_uid != 0 && sender_uid != getuid() && sender_uid != u->ref_uid)
return sd_bus_error_setf(error, SD_BUS_ERROR_ACCESS_DENIED, "Client is not permitted to alter cgroup.");
r = unit_remove_subcgroup(u, path);
if (r < 0)
return sd_bus_error_set_errnof(error, r, "Failed to remove subgroup %s: %m", path);
return sd_bus_reply_method_return(message, NULL);
}
const sd_bus_vtable bus_unit_cgroup_vtable[] = {
SD_BUS_VTABLE_START(0),
SD_BUS_PROPERTY("Slice", "s", property_get_slice, 0, 0),
@ -1631,6 +1682,12 @@ const sd_bus_vtable bus_unit_cgroup_vtable[] = {
bus_unit_method_attach_processes,
SD_BUS_VTABLE_UNPRIVILEGED),
SD_BUS_METHOD_WITH_ARGS("RemoveSubgroup",
SD_BUS_ARGS("s", subcgroup),
SD_BUS_NO_RESULT,
bus_unit_method_remove_subgroup,
SD_BUS_VTABLE_UNPRIVILEGED),
SD_BUS_VTABLE_END
};

View File

@ -22,6 +22,7 @@ int bus_unit_set_properties(Unit *u, sd_bus_message *message, UnitWriteFlags fla
int bus_unit_method_set_properties(sd_bus_message *message, void *userdata, sd_bus_error *error);
int bus_unit_method_get_processes(sd_bus_message *message, void *userdata, sd_bus_error *error);
int bus_unit_method_attach_processes(sd_bus_message *message, void *userdata, sd_bus_error *error);
int bus_unit_method_remove_subgroup(sd_bus_message *message, void *userdata, sd_bus_error *error);
int bus_unit_method_ref(sd_bus_message *message, void *userdata, sd_bus_error *error);
int bus_unit_method_unref(sd_bus_message *message, void *userdata, sd_bus_error *error);
int bus_unit_method_clean(sd_bus_message *message, void *userdata, sd_bus_error *error);

View File

@ -85,7 +85,9 @@ static int device_set_sysfs(Device *d, const char *sysfs) {
Unit *u = UNIT(ASSERT_PTR(d));
int r;
if (streq_ptr(d->sysfs, sysfs))
assert(sysfs);
if (path_equal(d->sysfs, sysfs))
return 0;
Hashmap **devices = &u->manager->devices_by_sysfs;
@ -332,6 +334,20 @@ static void device_catchup(Unit *u) {
Device *d = ASSERT_PTR(DEVICE(u));
/* Second, let's update the state with the enumerated state */
/* If Device.found (set from Device.deserialized_found) does not have DEVICE_FOUND_UDEV, and the
* device has not been processed by udevd while enumeration, it indicates the unit was never active
* before reexecution, hence we can safely drop the flag from Device.enumerated_found. The device
* will be set up later when udev finishes processing (see also comment in
* device_setup_devlink_unit_one()).
*
* NB: 💣💣💣 If Device.found already contains udev, i.e. the unit was fully ready before
* reexecution, do not unset the flag. Otherwise, e.g. if systemd-udev-trigger.service is started
* just before reexec, reload, and so on, devices being reprocessed (carrying ID_PROCESSING=1
* property) on enumeration and will enter dead state. See issue #35329. */
if (!FLAGS_SET(d->found, DEVICE_FOUND_UDEV) && !d->processed)
d->enumerated_found &= ~DEVICE_FOUND_UDEV;
device_update_found_one(d, d->enumerated_found, _DEVICE_FOUND_MASK);
}
@ -777,8 +793,16 @@ static int device_setup_devlink_unit_one(Manager *m, const char *devlink, Set **
assert(ready_units);
assert(not_ready_units);
if (sd_device_new_from_devname(&dev, devlink) >= 0 && device_is_ready(dev))
if (sd_device_new_from_devname(&dev, devlink) >= 0 && device_is_ready(dev)) {
if (MANAGER_IS_RUNNING(m) && device_is_processed(dev) <= 0)
/* The device is being processed by udevd. We will receive relevant uevent for the
* device later when completed. Let's ignore the device now. */
return 0;
/* Note, even if the device is being processed by udevd, setup the unit on enumerate.
* See also the comments in device_catchup(). */
return device_setup_unit(m, dev, devlink, /* main = */ false, ready_units);
}
/* the devlink is already removed or not ready */
if (device_by_path(m, devlink, &u) < 0)
@ -874,14 +898,15 @@ static int device_setup_extra_units(Manager *m, sd_device *dev, Set **ready_unit
return 0;
}
static int device_setup_units(Manager *m, sd_device *dev, Set **ready_units, Set **not_ready_units) {
static int device_setup_units(Manager *m, sd_device *dev, Set **ret_ready_units, Set **ret_not_ready_units) {
_cleanup_set_free_ Set *ready_units = NULL, *not_ready_units = NULL;
const char *syspath, *devname = NULL;
int r;
assert(m);
assert(dev);
assert(ready_units);
assert(not_ready_units);
assert(ret_ready_units);
assert(ret_not_ready_units);
r = sd_device_get_syspath(dev, &syspath);
if (r < 0)
@ -901,13 +926,13 @@ static int device_setup_units(Manager *m, sd_device *dev, Set **ready_units, Set
/* Add the main unit named after the syspath. If this one fails, don't bother with the rest,
* as this one shall be the main device unit the others just follow. (Compare with how
* device_following() is implemented, see below, which looks for the sysfs device.) */
r = device_setup_unit(m, dev, syspath, /* main = */ true, ready_units);
r = device_setup_unit(m, dev, syspath, /* main = */ true, &ready_units);
if (r < 0)
return r;
/* Add an additional unit for the device node */
if (sd_device_get_devname(dev, &devname) >= 0)
(void) device_setup_unit(m, dev, devname, /* main = */ false, ready_units);
(void) device_setup_unit(m, dev, devname, /* main = */ false, &ready_units);
} else {
Unit *u;
@ -915,28 +940,30 @@ static int device_setup_units(Manager *m, sd_device *dev, Set **ready_units, Set
/* If the device exists but not ready, then save the units and unset udev bits later. */
if (device_by_path(m, syspath, &u) >= 0) {
r = set_ensure_put(not_ready_units, NULL, DEVICE(u));
r = set_ensure_put(&not_ready_units, NULL, DEVICE(u));
if (r < 0)
log_unit_debug_errno(u, r, "Failed to store unit, ignoring: %m");
}
if (sd_device_get_devname(dev, &devname) >= 0 &&
device_by_path(m, devname, &u) >= 0) {
r = set_ensure_put(not_ready_units, NULL, DEVICE(u));
r = set_ensure_put(&not_ready_units, NULL, DEVICE(u));
if (r < 0)
log_unit_debug_errno(u, r, "Failed to store unit, ignoring: %m");
}
}
/* Next, add/update additional .device units point to aliases and symlinks. */
(void) device_setup_extra_units(m, dev, ready_units, not_ready_units);
(void) device_setup_extra_units(m, dev, &ready_units, &not_ready_units);
/* Safety check: no unit should be in ready_units and not_ready_units simultaneously. */
Unit *u;
SET_FOREACH(u, *not_ready_units)
if (set_remove(*ready_units, u))
SET_FOREACH(u, not_ready_units)
if (set_remove(ready_units, u))
log_unit_error(u, "Cannot activate and deactivate the unit simultaneously. Deactivating.");
*ret_ready_units = TAKE_PTR(ready_units);
*ret_not_ready_units = TAKE_PTR(not_ready_units);
return 0;
}
@ -1046,13 +1073,32 @@ static void device_enumerate(Manager *m) {
FOREACH_DEVICE(e, dev) {
_cleanup_set_free_ Set *ready_units = NULL, *not_ready_units = NULL;
const char *syspath;
bool processed;
Device *d;
r = sd_device_get_syspath(dev, &syspath);
if (r < 0) {
log_device_debug_errno(dev, r, "Failed to get syspath of enumerated device, ignoring: %m");
continue;
}
r = device_is_processed(dev);
if (r < 0)
log_device_debug_errno(dev, r, "Failed to check if device is processed by udevd, assuming not: %m");
processed = r > 0;
if (device_setup_units(m, dev, &ready_units, &not_ready_units) < 0)
continue;
SET_FOREACH(d, ready_units)
SET_FOREACH(d, ready_units) {
device_update_found_one(d, DEVICE_FOUND_UDEV, DEVICE_FOUND_UDEV);
/* Why we need to check the syspath here? Because the device unit may be generated by
* a devlink, and the syspath may be different from the one of the original device. */
if (path_equal(d->sysfs, syspath))
d->processed = processed;
}
SET_FOREACH(d, not_ready_units)
device_update_found_one(d, DEVICE_NOT_FOUND, DEVICE_FOUND_UDEV);
}
@ -1097,7 +1143,6 @@ static void device_remove_old_on_move(Manager *m, sd_device *dev) {
}
static int device_dispatch_io(sd_device_monitor *monitor, sd_device *dev, void *userdata) {
_cleanup_set_free_ Set *ready_units = NULL, *not_ready_units = NULL;
Manager *m = ASSERT_PTR(userdata);
sd_device_action_t action;
const char *sysfs;
@ -1150,6 +1195,7 @@ static int device_dispatch_io(sd_device_monitor *monitor, sd_device *dev, void *
* change events */
ready = device_is_ready(dev);
_cleanup_set_free_ Set *ready_units = NULL, *not_ready_units = NULL;
(void) device_setup_units(m, dev, &ready_units, &not_ready_units);
if (action == SD_DEVICE_REMOVE) {

View File

@ -29,7 +29,9 @@ struct Device {
DeviceState state, deserialized_state;
DeviceFound found, deserialized_found, enumerated_found;
bool processed; /* Whether udevd has done processing the device, i.e. the device has database and
* ID_PROCESSING=1 udev property is not set. This is used only by enumeration and
* subsequent catchup process. */
bool bind_mounts;
/* The SYSTEMD_WANTS udev property for this device the last time we saw it */

View File

@ -117,10 +117,9 @@ int exec_context_put_load_credential(ExecContext *c, const char *id, const char
return -ENOMEM;
r = hashmap_ensure_put(&c->load_credentials, &exec_load_credential_hash_ops, lc->id, lc);
if (r < 0) {
assert(r != -EEXIST);
if (r < 0)
return r;
}
TAKE_PTR(lc);
}
@ -167,10 +166,9 @@ int exec_context_put_set_credential(
return -ENOMEM;
r = hashmap_ensure_put(&c->set_credentials, &exec_set_credential_hash_ops, sc->id, sc);
if (r < 0) {
assert(r != -EEXIST);
if (r < 0)
return r;
}
TAKE_PTR(sc);
}
@ -193,19 +191,22 @@ int exec_context_put_import_credential(ExecContext *c, const char *glob, const c
*ic = (ExecImportCredential) {
.glob = strdup(glob),
.rename = rename ? strdup(rename) : NULL,
};
if (!ic->glob || (rename && !ic->rename))
if (!ic->glob)
return -ENOMEM;
if (rename) {
ic->rename = strdup(rename);
if (!ic->rename)
return -ENOMEM;
}
if (ordered_set_contains(c->import_credentials, ic))
return 0;
r = ordered_set_ensure_put(&c->import_credentials, &exec_import_credential_hash_ops, ic);
if (r < 0) {
assert(r != -EEXIST);
if (r < 0)
return r;
}
TAKE_PTR(ic);
@ -383,30 +384,46 @@ typedef enum CredentialSearchPath {
_CREDENTIAL_SEARCH_PATH_INVALID = -EINVAL,
} CredentialSearchPath;
static char** credential_search_path(const ExecParameters *params, CredentialSearchPath path) {
static int credential_search_path(const ExecParameters *params, CredentialSearchPath path, char ***ret) {
_cleanup_strv_free_ char **l = NULL;
int r;
assert(params);
assert(path >= 0 && path < _CREDENTIAL_SEARCH_PATH_MAX);
assert(ret);
/* Assemble a search path to find credentials in. For non-encrypted credentials, We'll look in
* /etc/credstore/ (and similar directories in /usr/lib/ + /run/). If we're looking for encrypted
* credentials, we'll look in /etc/credstore.encrypted/ (and similar dirs). */
if (IN_SET(path, CREDENTIAL_SEARCH_PATH_ENCRYPTED, CREDENTIAL_SEARCH_PATH_ALL)) {
if (strv_extend(&l, params->received_encrypted_credentials_directory) < 0)
return NULL;
r = strv_extend(&l, params->received_encrypted_credentials_directory);
if (r < 0)
return r;
if (strv_extend_strv(&l, CONF_PATHS_STRV("credstore.encrypted"), /* filter_duplicates= */ true) < 0)
return NULL;
_cleanup_strv_free_ char **add = NULL;
r = credential_store_path_encrypted(params->runtime_scope, &add);
if (r < 0)
return r;
r = strv_extend_strv_consume(&l, TAKE_PTR(add), /* filter_duplicates= */ false);
if (r < 0)
return r;
}
if (IN_SET(path, CREDENTIAL_SEARCH_PATH_TRUSTED, CREDENTIAL_SEARCH_PATH_ALL)) {
if (strv_extend(&l, params->received_credentials_directory) < 0)
return NULL;
r = strv_extend(&l, params->received_credentials_directory);
if (r < 0)
return r;
if (strv_extend_strv(&l, CONF_PATHS_STRV("credstore"), /* filter_duplicates= */ true) < 0)
return NULL;
_cleanup_strv_free_ char **add = NULL;
r = credential_store_path(params->runtime_scope, &add);
if (r < 0)
return r;
r = strv_extend_strv_consume(&l, TAKE_PTR(add), /* filter_duplicates= */ false);
if (r < 0)
return r;
}
if (DEBUG_LOGGING) {
@ -414,7 +431,8 @@ static char** credential_search_path(const ExecParameters *params, CredentialSea
log_debug("Credential search path is: %s", strempty(t));
}
return TAKE_PTR(l);
*ret = TAKE_PTR(l);
return 0;
}
struct load_cred_args {
@ -445,6 +463,10 @@ static int maybe_decrypt_and_write_credential(
assert(data || size == 0);
if (args->encrypted) {
switch (args->params->runtime_scope) {
case RUNTIME_SCOPE_SYSTEM:
/* In system mode talk directly to the TPM */
r = decrypt_credential_and_warn(
id,
now(CLOCK_REALTIME),
@ -454,6 +476,25 @@ static int maybe_decrypt_and_write_credential(
&IOVEC_MAKE(data, size),
CREDENTIAL_ANY_SCOPE,
&plaintext);
break;
case RUNTIME_SCOPE_USER:
/* In per user mode we'll not have access to the machine secret, nor to the TPM (most
* likely), hence go via the IPC service instead. Do this if we are run in root's
* per-user invocation too, to minimize differences and because isolating this logic
* into a separate process is generally a good thing anyway. */
r = ipc_decrypt_credential(
id,
now(CLOCK_REALTIME),
getuid(),
&IOVEC_MAKE(data, size),
/* flags= */ 0, /* only allow user creds in user scope */
&plaintext);
break;
default:
assert_not_reached();
}
if (r < 0)
return r;
@ -611,9 +652,9 @@ static int load_credential(
* directory we received ourselves. We don't support the AF_UNIX stuff in this mode, since we
* are operating on a credential store, i.e. this is guaranteed to be regular files. */
search_path = credential_search_path(args->params, CREDENTIAL_SEARCH_PATH_ALL);
if (!search_path)
return -ENOMEM;
r = credential_search_path(args->params, CREDENTIAL_SEARCH_PATH_ALL, &search_path);
if (r < 0)
return r;
missing_ok = true;
} else
@ -797,9 +838,9 @@ static int acquire_credentials(
ORDERED_SET_FOREACH(ic, context->import_credentials) {
_cleanup_free_ char **search_path = NULL;
search_path = credential_search_path(params, CREDENTIAL_SEARCH_PATH_TRUSTED);
if (!search_path)
return -ENOMEM;
r = credential_search_path(params, CREDENTIAL_SEARCH_PATH_TRUSTED, &search_path);
if (r < 0)
return r;
args.encrypted = false;
@ -811,9 +852,10 @@ static int acquire_credentials(
return r;
search_path = strv_free(search_path);
search_path = credential_search_path(params, CREDENTIAL_SEARCH_PATH_ENCRYPTED);
if (!search_path)
return -ENOMEM;
r = credential_search_path(params, CREDENTIAL_SEARCH_PATH_ENCRYPTED, &search_path);
if (r < 0)
return r;
args.encrypted = true;

View File

@ -274,6 +274,10 @@
send_interface="org.freedesktop.systemd1.Manager"
send_member="AttachProcessesToUnit"/>
<allow send_destination="org.freedesktop.systemd1"
send_interface="org.freedesktop.systemd1.Manager"
send_member="RemoveSubgroupFromUnit"/>
<allow send_destination="org.freedesktop.systemd1"
send_interface="org.freedesktop.systemd1.Manager"
send_member="CancelJob"/>
@ -432,6 +436,10 @@
send_interface="org.freedesktop.systemd1.Service"
send_member="AttachProcesses"/>
<allow send_destination="org.freedesktop.systemd1"
send_interface="org.freedesktop.systemd1.Service"
send_member="RemoveSubgroupFromUnit"/>
<allow send_destination="org.freedesktop.systemd1"
send_interface="org.freedesktop.systemd1.Service"
send_member="BindMount"/>
@ -446,6 +454,10 @@
send_interface="org.freedesktop.systemd1.Scope"
send_member="AttachProcesses"/>
<allow send_destination="org.freedesktop.systemd1"
send_interface="org.freedesktop.systemd1.Service"
send_member="RemoveSubgroupFromUnit"/>
<allow receive_sender="org.freedesktop.systemd1"/>
</policy>

View File

@ -102,6 +102,8 @@ containeruidbasemin=${container_uid_base_min}
container_uid_base_max={{CONTAINER_UID_BASE_MAX}}
containeruidbasemax=${container_uid_base_max}
foreign_uid_base={{FOREIGN_UID_BASE}}
Name: systemd
Description: systemd System and Service Manager
URL: {{PROJECT_URL}}

View File

@ -3,9 +3,11 @@
#include <unistd.h>
#include "alloc-util.h"
#include "bitfield.h"
#include "creds-util.h"
#include "dropin.h"
#include "errno-util.h"
#include "extract-word.h"
#include "fd-util.h"
#include "fileio.h"
#include "generator.h"
@ -27,6 +29,7 @@ static char **arg_wants = NULL;
static bool arg_debug_shell = false;
static char *arg_debug_tty = NULL;
static char *arg_default_debug_tty = NULL;
static uint32_t arg_breakpoints = 0;
STATIC_DESTRUCTOR_REGISTER(arg_default_unit, freep);
STATIC_DESTRUCTOR_REGISTER(arg_mask, strv_freep);
@ -34,6 +37,91 @@ STATIC_DESTRUCTOR_REGISTER(arg_wants, strv_freep);
STATIC_DESTRUCTOR_REGISTER(arg_debug_tty, freep);
STATIC_DESTRUCTOR_REGISTER(arg_default_debug_tty, freep);
typedef enum BreakpointType {
BREAKPOINT_PRE_UDEV,
BREAKPOINT_PRE_BASIC,
BREAKPOINT_PRE_SYSROOT_MOUNT,
BREAKPOINT_PRE_SWITCH_ROOT,
_BREAKPOINT_TYPE_MAX,
_BREAKPOINT_TYPE_INVALID = -EINVAL,
} BreakpointType;
typedef enum BreakpointValidity {
BREAKPOINT_DEFAULT = 1 << 0,
BREAKPOINT_IN_INITRD = 1 << 1,
BREAKPOINT_ON_HOST = 1 << 2,
} BreakpointValidity;
typedef struct BreakpointInfo {
BreakpointType type;
const char *name;
const char *unit;
BreakpointValidity validity;
} BreakpointInfo;
static const struct BreakpointInfo breakpoint_info_table[_BREAKPOINT_TYPE_MAX] = {
{ BREAKPOINT_PRE_UDEV, "pre-udev", "breakpoint-pre-udev.service", BREAKPOINT_IN_INITRD | BREAKPOINT_ON_HOST },
{ BREAKPOINT_PRE_BASIC, "pre-basic", "breakpoint-pre-basic.service", BREAKPOINT_IN_INITRD | BREAKPOINT_ON_HOST },
{ BREAKPOINT_PRE_SYSROOT_MOUNT, "pre-mount", "breakpoint-pre-mount.service", BREAKPOINT_IN_INITRD },
{ BREAKPOINT_PRE_SWITCH_ROOT, "pre-switch-root", "breakpoint-pre-switch-root.service", BREAKPOINT_IN_INITRD | BREAKPOINT_DEFAULT },
};
static BreakpointType parse_breakpoint_from_string_one(const char *s) {
assert(s);
FOREACH_ARRAY(i, breakpoint_info_table, ELEMENTSOF(breakpoint_info_table))
if (streq(i->name, s))
return i->type;
return _BREAKPOINT_TYPE_INVALID;
}
static int parse_breakpoint_from_string(const char *s, uint32_t *ret_breakpoints) {
uint32_t breakpoints = 0;
int r;
assert(ret_breakpoints);
/* Empty value? set default breakpoint */
if (isempty(s)) {
if (in_initrd()) {
FOREACH_ARRAY(i, breakpoint_info_table, ELEMENTSOF(breakpoint_info_table))
if (i->validity & BREAKPOINT_DEFAULT) {
breakpoints |= 1 << i->type;
break;
}
} else
log_warning("No default breakpoint defined on the host, ignoring breakpoint request from kernel command line.");
} else
for (;;) {
_cleanup_free_ char *t = NULL;
BreakpointType tt;
r = extract_first_word(&s, &t, ",", EXTRACT_DONT_COALESCE_SEPARATORS);
if (r < 0)
return r;
if (r == 0)
break;
tt = parse_breakpoint_from_string_one(t);
if (tt < 0) {
log_warning("Invalid breakpoint value '%s', ignoring.", t);
continue;
}
if (in_initrd() && !FLAGS_SET(breakpoint_info_table[tt].validity, BREAKPOINT_IN_INITRD))
log_warning("Breakpoint '%s' not valid in the initrd, ignoring.", t);
else if (!in_initrd() && !FLAGS_SET(breakpoint_info_table[tt].validity, BREAKPOINT_ON_HOST))
log_warning("Breakpoint '%s' not valid on the host, ignoring.", t);
else
breakpoints |= 1 << tt;
}
*ret_breakpoints = breakpoints;
return 0;
}
static int parse_proc_cmdline_item(const char *key, const char *value, void *data) {
int r;
@ -88,6 +176,15 @@ static int parse_proc_cmdline_item(const char *key, const char *value, void *dat
return free_and_strdup_warn(&arg_default_unit, value);
} else if (streq(key, "systemd.break")) {
uint32_t breakpoints = 0;
r = parse_breakpoint_from_string(value, &breakpoints);
if (r < 0)
return log_warning_errno(r, "Failed to parse breakpoint value '%s': %m", value);
arg_breakpoints |= breakpoints;
} else if (!value) {
const char *target;
@ -269,6 +366,10 @@ static int run(const char *dest, const char *dest_early, const char *dest_late)
RET_GATHER(r, install_debug_shell_dropin());
}
BIT_FOREACH(i, arg_breakpoints)
if (strv_extend(&arg_wants, breakpoint_info_table[i].unit) < 0)
return log_oom();
if (get_credentials_dir(&credentials_dir) >= 0)
RET_GATHER(r, process_unit_credentials(credentials_dir));

View File

@ -45,6 +45,7 @@
#include "process-util.h"
#include "recurse-dir.h"
#include "sha256.h"
#include "shift-uid.h"
#include "stat-util.h"
#include "string-util.h"
#include "strv.h"
@ -68,6 +69,7 @@ static enum {
ACTION_DISCOVER,
ACTION_VALIDATE,
ACTION_MAKE_ARCHIVE,
ACTION_SHIFT,
} arg_action = ACTION_DISSECT;
static char *arg_image = NULL;
static char *arg_root = NULL;
@ -95,6 +97,9 @@ static char *arg_loop_ref = NULL;
static ImagePolicy *arg_image_policy = NULL;
static bool arg_mtree_hash = true;
static bool arg_via_service = false;
static RuntimeScope arg_runtime_scope = _RUNTIME_SCOPE_INVALID;
static uid_t arg_uid_base = UID_INVALID;
static bool arg_all = false;
STATIC_DESTRUCTOR_REGISTER(arg_image, freep);
STATIC_DESTRUCTOR_REGISTER(arg_root, freep);
@ -127,6 +132,7 @@ static int help(void) {
"%1$s [OPTIONS...] --make-archive IMAGE [TARGET]\n"
"%1$s [OPTIONS...] --discover\n"
"%1$s [OPTIONS...] --validate IMAGE\n"
"%1$s [OPTIONS...] --shift IMAGE UIDBASE\n"
"\n%5$sDissect a Discoverable Disk Image (DDI).%6$s\n\n"
"%3$sOptions:%4$s\n"
" --no-pager Do not pipe output into a pager\n"
@ -151,6 +157,9 @@ static int help(void) {
" Generate JSON output\n"
" --loop-ref=NAME Set reference string for loopback device\n"
" --mtree-hash=BOOL Whether to include SHA256 hash in the mtree output\n"
" --user Discover user images\n"
" --system Discover system images\n"
" --all Show hidden images too\n"
"\n%3$sCommands:%4$s\n"
" -h --help Show this help\n"
" --version Show package version\n"
@ -169,6 +178,7 @@ static int help(void) {
" --make-archive Convert the DDI to an archive file\n"
" --discover Discover DDIs in well known directories\n"
" --validate Validate image and image policy\n"
" --shift Shift UID range to selected base\n"
"\nSee the %2$s for details.\n",
program_invocation_short_name,
link,
@ -274,6 +284,10 @@ static int parse_argv(int argc, char *argv[]) {
ARG_VALIDATE,
ARG_MTREE_HASH,
ARG_MAKE_ARCHIVE,
ARG_SHIFT,
ARG_SYSTEM,
ARG_USER,
ARG_ALL,
};
static const struct option options[] = {
@ -307,10 +321,15 @@ static int parse_argv(int argc, char *argv[]) {
{ "validate", no_argument, NULL, ARG_VALIDATE },
{ "mtree-hash", required_argument, NULL, ARG_MTREE_HASH },
{ "make-archive", no_argument, NULL, ARG_MAKE_ARCHIVE },
{ "shift", no_argument, NULL, ARG_SHIFT },
{ "system", no_argument, NULL, ARG_SYSTEM },
{ "user", no_argument, NULL, ARG_USER },
{ "all", no_argument, NULL, ARG_ALL },
{}
};
_cleanup_free_ char **buf = NULL; /* we use free(), not strv_free() here, as we don't copy the strings here */
bool system_scope_requested = false, user_scope_requested = false;
int c, r;
assert(argc >= 0);
@ -531,7 +550,6 @@ static int parse_argv(int argc, char *argv[]) {
break;
case ARG_MAKE_ARCHIVE:
r = dlopen_libarchive();
if (r < 0)
return log_error_errno(r, "Archive support not available (compiled without libarchive, or libarchive not installed?).");
@ -539,6 +557,22 @@ static int parse_argv(int argc, char *argv[]) {
arg_action = ACTION_MAKE_ARCHIVE;
break;
case ARG_SHIFT:
arg_action = ACTION_SHIFT;
break;
case ARG_SYSTEM:
system_scope_requested = true;
break;
case ARG_USER:
user_scope_requested = true;
break;
case ARG_ALL:
arg_all = true;
break;
case '?':
return -EINVAL;
@ -547,6 +581,10 @@ static int parse_argv(int argc, char *argv[]) {
}
}
if (system_scope_requested || user_scope_requested)
arg_runtime_scope = system_scope_requested && user_scope_requested ? _RUNTIME_SCOPE_INVALID :
system_scope_requested ? RUNTIME_SCOPE_SYSTEM : RUNTIME_SCOPE_USER;
switch (arg_action) {
case ACTION_DISSECT:
@ -704,6 +742,33 @@ static int parse_argv(int argc, char *argv[]) {
arg_flags &= ~(DISSECT_IMAGE_PIN_PARTITION_DEVICES|DISSECT_IMAGE_ADD_PARTITION_DEVICES);
break;
case ACTION_SHIFT:
if (optind + 2 != argc)
return log_error_errno(SYNTHETIC_ERRNO(EINVAL),
"Expected an image path and a UID base as only argument.");
r = parse_image_path_argument(argv[optind], &arg_root, &arg_image);
if (r < 0)
return r;
if (streq(argv[optind + 1], "foreign"))
arg_uid_base = FOREIGN_UID_BASE;
else {
r = parse_uid(argv[optind + 1], &arg_uid_base);
if (r < 0)
return log_error_errno(r, "Failed to parse UID base: %s", argv[optind + 1]);
if ((arg_uid_base & 0xFFFF) != 0)
return log_error_errno(SYNTHETIC_ERRNO(EINVAL), "Selected UID base not a multiple of 64K: " UID_FMT, arg_uid_base);
if (arg_uid_base != 0 &&
!uid_is_container(arg_uid_base) &&
!uid_is_foreign(arg_uid_base))
return log_error_errno(SYNTHETIC_ERRNO(EINVAL), "Selected UID range is not in the container range, nor the foreign one, refusing.");
}
arg_flags |= DISSECT_IMAGE_REQUIRE_ROOT;
break;
default:
assert_not_reached();
}
@ -1172,7 +1237,7 @@ static const char *pick_color_for_uid_gid(uid_t uid) {
return ansi_normal(); /* files in disk images are typically owned by root and other system users, no issue there */
if (uid_is_dynamic(uid))
return ansi_highlight_red(); /* files should never be owned persistently by dynamic users, and there are just no excuses */
if (uid_is_container(uid))
if (uid_is_container(uid) || uid_is_foreign(uid))
return ansi_highlight_cyan();
return ansi_highlight();
@ -1417,7 +1482,7 @@ static int action_list_or_mtree_or_copy_or_make_archive(DissectedImage *m, LoopD
const char *root;
int r;
assert(IN_SET(arg_action, ACTION_LIST, ACTION_MTREE, ACTION_COPY_FROM, ACTION_COPY_TO, ACTION_MAKE_ARCHIVE));
assert(IN_SET(arg_action, ACTION_LIST, ACTION_MTREE, ACTION_COPY_FROM, ACTION_COPY_TO, ACTION_MAKE_ARCHIVE, ACTION_SHIFT));
if (arg_image) {
assert(m);
@ -1445,7 +1510,7 @@ static int action_list_or_mtree_or_copy_or_make_archive(DissectedImage *m, LoopD
if (r < 0)
return r;
mounted_dir = TAKE_PTR(t);
root = mounted_dir = TAKE_PTR(t);
if (d) {
r = loop_device_flock(d, LOCK_UN);
@ -1456,11 +1521,10 @@ static int action_list_or_mtree_or_copy_or_make_archive(DissectedImage *m, LoopD
r = dissected_image_relinquish(m);
if (r < 0)
return log_error_errno(r, "Failed to relinquish DM and loopback block devices: %m");
}
root = mounted_dir ?: arg_root;
dissected_image_close(m);
} else
root = arg_root;
switch (arg_action) {
@ -1673,6 +1737,13 @@ static int action_list_or_mtree_or_copy_or_make_archive(DissectedImage *m, LoopD
#endif
}
case ACTION_SHIFT:
r = path_patch_uid(root, arg_uid_base, 0x10000);
if (r < 0)
return log_error_errno(r, "Failed to shift UID base: %m");
return 0;
default:
assert_not_reached();
}
@ -1851,7 +1922,7 @@ static int action_discover(void) {
return log_oom();
for (ImageClass cl = 0; cl < _IMAGE_CLASS_MAX; cl++) {
r = image_discover(cl, NULL, images);
r = image_discover(arg_runtime_scope, cl, NULL, images);
if (r < 0)
return log_error_errno(r, "Failed to discover images: %m");
}
@ -1870,18 +1941,21 @@ static int action_discover(void) {
HASHMAP_FOREACH(img, images) {
if (!IN_SET(img->type, IMAGE_RAW, IMAGE_BLOCK))
if (!arg_all && startswith(img->name, "."))
continue;
r = table_add_many(
t,
TABLE_STRING, img->name,
TABLE_SET_COLOR, startswith(img->name, ".") ? ANSI_GREY : NULL,
TABLE_STRING, image_type_to_string(img->type),
TABLE_STRING, image_class_to_string(img->class),
TABLE_BOOLEAN, img->read_only,
TABLE_SET_COLOR, !img->read_only ? ANSI_HIGHLIGHT_GREEN : ANSI_HIGHLIGHT_RED,
TABLE_PATH, img->path,
TABLE_TIMESTAMP, img->mtime != 0 ? img->mtime : img->crtime,
TABLE_SIZE, img->usage);
TABLE_SIZE, img->usage,
TABLE_SET_COLOR, img->usage <= 0 ? ANSI_HIGHLIGHT_RED : NULL);
if (r < 0)
return table_log_add_error(r);
}
@ -2036,6 +2110,16 @@ static int run(int argc, char *argv[]) {
return r;
}
if (arg_root) {
r = path_pick_update_warn(
&arg_root,
&pick_filter_image_dir,
PICK_ARCHITECTURE|PICK_TRIES,
/* ret_result= */ NULL);
if (r < 0)
return r;
}
switch (arg_action) {
case ACTION_UMOUNT:
return action_umount(arg_path);
@ -2082,7 +2166,7 @@ static int run(int argc, char *argv[]) {
else
r = loop_device_make_by_path(arg_image, open_flags, /* sector_size= */ UINT32_MAX, loop_flags, LOCK_SH, &d);
if (r < 0) {
if (!ERRNO_IS_PRIVILEGE(r) || !IN_SET(arg_action, ACTION_DISSECT, ACTION_LIST, ACTION_MTREE, ACTION_COPY_FROM, ACTION_COPY_TO))
if (!ERRNO_IS_PRIVILEGE(r) || !IN_SET(arg_action, ACTION_DISSECT, ACTION_LIST, ACTION_MTREE, ACTION_COPY_FROM, ACTION_COPY_TO, ACTION_SHIFT))
return log_error_errno(r, "Failed to set up loopback device for %s: %m", arg_image);
log_debug_errno(r, "Lacking permissions to set up loopback block device for %s, using service: %m", arg_image);
@ -2167,6 +2251,7 @@ static int run(int argc, char *argv[]) {
case ACTION_COPY_FROM:
case ACTION_COPY_TO:
case ACTION_MAKE_ARCHIVE:
case ACTION_SHIFT:
return action_list_or_mtree_or_copy_or_make_archive(m, d, userns_fd);
case ACTION_WITH:

View File

@ -28,20 +28,33 @@
#include "memory-util-fundamental.h"
#include "sha1-fundamental.h"
static void get_chid(const char16_t *const smbios_fields[static _CHID_SMBIOS_FIELDS_MAX], uint32_t mask, EFI_GUID *ret_chid) {
static void get_chid(
const char16_t *const smbios_fields[static _CHID_SMBIOS_FIELDS_MAX],
uint32_t mask,
EFI_GUID *ret_chid) {
assert(mask != 0);
assert(ret_chid);
const EFI_GUID namespace = { UINT32_C(0x12d8ff70), UINT16_C(0x7f4c), UINT16_C(0x7d4c), {} }; /* Swapped to BE */
struct sha1_ctx ctx = {};
sha1_init_ctx(&ctx);
static const EFI_GUID namespace = { UINT32_C(0x12d8ff70), UINT16_C(0x7f4c), UINT16_C(0x7d4c), {} }; /* Swapped to BE */
sha1_process_bytes(&namespace, sizeof(namespace), &ctx);
for (unsigned i = 0; i < _CHID_SMBIOS_FIELDS_MAX; i++)
if ((mask >> i) & 1) {
for (ChidSmbiosFields i = 0; i < _CHID_SMBIOS_FIELDS_MAX; i++) {
if (!FLAGS_SET(mask, UINT32_C(1) << i))
continue;
if (!smbios_fields[i]) {
/* If some SMBIOS field is missing, don't generate the CHID, as per spec */
memzero(ret_chid, sizeof(EFI_GUID));
return;
}
if (i > 0)
sha1_process_bytes(L"&", 2, &ctx);
sha1_process_bytes(smbios_fields[i], strlen16(smbios_fields[i]) * sizeof(char16_t), &ctx);
}
@ -61,7 +74,31 @@ static void get_chid(const char16_t *const smbios_fields[static _CHID_SMBIOS_FIE
ret_chid->Data4[0] = (ret_chid->Data4[0] & UINT8_C(0x3f)) | UINT8_C(0x80);
}
static const uint32_t chid_smbios_table[CHID_TYPES_MAX] = {
const uint32_t chid_smbios_table[CHID_TYPES_MAX] = {
[0] = (UINT32_C(1) << CHID_SMBIOS_MANUFACTURER) |
(UINT32_C(1) << CHID_SMBIOS_FAMILY) |
(UINT32_C(1) << CHID_SMBIOS_PRODUCT_NAME) |
(UINT32_C(1) << CHID_SMBIOS_PRODUCT_SKU) |
(UINT32_C(1) << CHID_SMBIOS_BIOS_VENDOR) |
(UINT32_C(1) << CHID_SMBIOS_BIOS_VERSION) |
(UINT32_C(1) << CHID_SMBIOS_BIOS_MAJOR) |
(UINT32_C(1) << CHID_SMBIOS_BIOS_MINOR),
[1] = (UINT32_C(1) << CHID_SMBIOS_MANUFACTURER) |
(UINT32_C(1) << CHID_SMBIOS_FAMILY) |
(UINT32_C(1) << CHID_SMBIOS_PRODUCT_NAME) |
(UINT32_C(1) << CHID_SMBIOS_BIOS_VENDOR) |
(UINT32_C(1) << CHID_SMBIOS_BIOS_VERSION) |
(UINT32_C(1) << CHID_SMBIOS_BIOS_MAJOR) |
(UINT32_C(1) << CHID_SMBIOS_BIOS_MINOR),
[2] = (UINT32_C(1) << CHID_SMBIOS_MANUFACTURER) |
(UINT32_C(1) << CHID_SMBIOS_PRODUCT_NAME) |
(UINT32_C(1) << CHID_SMBIOS_BIOS_VENDOR) |
(UINT32_C(1) << CHID_SMBIOS_BIOS_VERSION) |
(UINT32_C(1) << CHID_SMBIOS_BIOS_MAJOR) |
(UINT32_C(1) << CHID_SMBIOS_BIOS_MINOR),
[3] = (UINT32_C(1) << CHID_SMBIOS_MANUFACTURER) |
(UINT32_C(1) << CHID_SMBIOS_FAMILY) |
(UINT32_C(1) << CHID_SMBIOS_PRODUCT_NAME) |
@ -102,18 +139,26 @@ static const uint32_t chid_smbios_table[CHID_TYPES_MAX] = {
[11] = (UINT32_C(1) << CHID_SMBIOS_MANUFACTURER) |
(UINT32_C(1) << CHID_SMBIOS_FAMILY),
[12] = (UINT32_C(1) << CHID_SMBIOS_MANUFACTURER) |
(UINT32_C(1) << CHID_SMBIOS_ENCLOSURE_TYPE),
[13] = (UINT32_C(1) << CHID_SMBIOS_MANUFACTURER) |
(UINT32_C(1) << CHID_SMBIOS_BASEBOARD_MANUFACTURER) |
(UINT32_C(1) << CHID_SMBIOS_BASEBOARD_PRODUCT),
[14] = (UINT32_C(1) << CHID_SMBIOS_MANUFACTURER),
};
void chid_calculate(const char16_t *const smbios_fields[static _CHID_SMBIOS_FIELDS_MAX], EFI_GUID ret_chids[static CHID_TYPES_MAX]) {
assert(smbios_fields);
assert(ret_chids);
for (size_t i = 0; i < CHID_TYPES_MAX; i++)
if (chid_smbios_table[i] != 0)
get_chid(smbios_fields, chid_smbios_table[i], &ret_chids[i]);
else
for (size_t i = 0; i < CHID_TYPES_MAX; i++) {
if (chid_smbios_table[i] == 0) {
memzero(&ret_chids[i], sizeof(EFI_GUID));
continue;
}
get_chid(smbios_fields, chid_smbios_table[i], &ret_chids[i]);
}
}

View File

@ -20,8 +20,15 @@ typedef enum ChidSmbiosFields {
CHID_SMBIOS_PRODUCT_SKU,
CHID_SMBIOS_BASEBOARD_MANUFACTURER,
CHID_SMBIOS_BASEBOARD_PRODUCT,
CHID_SMBIOS_BIOS_VENDOR,
CHID_SMBIOS_BIOS_VERSION,
CHID_SMBIOS_BIOS_MAJOR,
CHID_SMBIOS_BIOS_MINOR,
CHID_SMBIOS_ENCLOSURE_TYPE,
_CHID_SMBIOS_FIELDS_MAX,
} ChidSmbiosFields;
extern const uint32_t chid_smbios_table[CHID_TYPES_MAX];
/* CHID (also called HWID by fwupd) is described at https://github.com/fwupd/fwupd/blob/main/docs/hwids.md */
void chid_calculate(const char16_t *const smbios_fields[static _CHID_SMBIOS_FIELDS_MAX], EFI_GUID ret_chids[static CHID_TYPES_MAX]);

View File

@ -25,6 +25,7 @@
static ImportCompressType arg_compress = IMPORT_COMPRESS_UNKNOWN;
static ImageClass arg_class = IMAGE_MACHINE;
static RuntimeScope arg_runtime_scope = _RUNTIME_SCOPE_INVALID;
static void determine_compression_from_filename(const char *p) {
@ -66,7 +67,7 @@ static int export_tar(int argc, char *argv[], void *userdata) {
local = argv[1];
if (image_name_is_valid(local)) {
r = image_find(arg_class, local, NULL, &image);
r = image_find(arg_runtime_scope, arg_class, local, NULL, &image);
if (r == -ENOENT)
return log_error_errno(r, "Image %s not found.", local);
if (r < 0)
@ -139,7 +140,7 @@ static int export_raw(int argc, char *argv[], void *userdata) {
local = argv[1];
if (image_name_is_valid(local)) {
r = image_find(arg_class, local, NULL, &image);
r = image_find(arg_runtime_scope, arg_class, local, NULL, &image);
if (r == -ENOENT)
return log_error_errno(r, "Image %s not found.", local);
if (r < 0)

View File

@ -34,6 +34,7 @@ static bool arg_sync = true;
static bool arg_direct = false;
static const char *arg_image_root = NULL;
static ImageClass arg_class = IMAGE_MACHINE;
static RuntimeScope arg_runtime_scope = _RUNTIME_SCOPE_INVALID;
typedef struct ProgressInfo {
RateLimit limit;
@ -145,7 +146,7 @@ static int import_fs(int argc, char *argv[], void *userdata) {
return log_oom();
if (!arg_force) {
r = image_find(arg_class, local, NULL, NULL);
r = image_find(arg_runtime_scope, arg_class, local, NULL, NULL);
if (r < 0) {
if (r != -ENOENT)
return log_error_errno(r, "Failed to check whether image '%s' exists: %m", local);

View File

@ -30,6 +30,7 @@ static const char *arg_image_root = NULL;
static ImportFlags arg_import_flags = IMPORT_BTRFS_SUBVOL | IMPORT_BTRFS_QUOTA | IMPORT_CONVERT_QCOW2 | IMPORT_SYNC;
static uint64_t arg_offset = UINT64_MAX, arg_size_max = UINT64_MAX;
static ImageClass arg_class = IMAGE_MACHINE;
static RuntimeScope arg_runtime_scope = _RUNTIME_SCOPE_INVALID;
static int normalize_local(const char *local, char **ret) {
_cleanup_free_ char *ll = NULL;
@ -63,7 +64,7 @@ static int normalize_local(const char *local, char **ret) {
local = "imported";
if (!FLAGS_SET(arg_import_flags, IMPORT_FORCE)) {
r = image_find(arg_class, local, NULL, NULL);
r = image_find(arg_runtime_scope, arg_class, local, NULL, NULL);
if (r < 0) {
if (r != -ENOENT)
return log_error_errno(r, "Failed to check whether image '%s' exists: %m", local);

View File

@ -111,6 +111,8 @@ struct Manager {
bool use_btrfs_subvol;
bool use_btrfs_quota;
RuntimeScope runtime_scope; /* for now: always RUNTIME_SCOPE_SYSTEM */
};
#define TRANSFERS_MAX 64
@ -721,6 +723,7 @@ static int manager_new(Manager **ret) {
*m = (Manager) {
.use_btrfs_subvol = true,
.use_btrfs_quota = true,
.runtime_scope = RUNTIME_SCOPE_SYSTEM,
};
r = sd_event_default(&m->event);
@ -1332,6 +1335,7 @@ static int method_cancel_transfer(sd_bus_message *msg, void *userdata, sd_bus_er
static int method_list_images(sd_bus_message *msg, void *userdata, sd_bus_error *error) {
_cleanup_(sd_bus_message_unrefp) sd_bus_message *reply = NULL;
ImageClass class = _IMAGE_CLASS_INVALID;
Manager *m = ASSERT_PTR(userdata);
int r;
assert(msg);
@ -1372,7 +1376,7 @@ static int method_list_images(sd_bus_message *msg, void *userdata, sd_bus_error
if (!h)
return -ENOMEM;
r = image_discover(c, /* root= */ NULL, h);
r = image_discover(m->runtime_scope, c, /* root= */ NULL, h);
if (r < 0) {
if (class >= 0)
return r;

View File

@ -33,6 +33,7 @@ static ImportFlags arg_import_flags = IMPORT_PULL_SETTINGS | IMPORT_PULL_ROOTHAS
static uint64_t arg_offset = UINT64_MAX, arg_size_max = UINT64_MAX;
static char *arg_checksum = NULL;
static ImageClass arg_class = IMAGE_MACHINE;
static RuntimeScope arg_runtime_scope = _RUNTIME_SCOPE_INVALID;
STATIC_DESTRUCTOR_REGISTER(arg_checksum, freep);
@ -66,7 +67,7 @@ static int normalize_local(const char *local, const char *url, char **ret) {
local);
if (!FLAGS_SET(arg_import_flags, IMPORT_FORCE)) {
r = image_find(arg_class, local, NULL, NULL);
r = image_find(arg_runtime_scope, arg_class, local, NULL, NULL);
if (r < 0) {
if (r != -ENOENT)
return log_error_errno(r, "Failed to check whether image '%s' exists: %m", local);

View File

@ -132,6 +132,7 @@ static int client_context_new(Server *s, pid_t pid, ClientContext **ret) {
.log_level_max = -1,
.log_ratelimit_interval = s->ratelimit_interval,
.log_ratelimit_burst = s->ratelimit_burst,
.capability_quintet = CAPABILITY_QUINTET_NULL,
};
r = hashmap_ensure_put(&s->client_contexts, NULL, PID_TO_PTR(pid), c);
@ -154,7 +155,6 @@ static void client_context_reset(Server *s, ClientContext *c) {
c->comm = mfree(c->comm);
c->exe = mfree(c->exe);
c->cmdline = mfree(c->cmdline);
c->capeff = mfree(c->capeff);
c->auditid = AUDIT_SESSION_INVALID;
c->loginuid = UID_INVALID;
@ -184,6 +184,8 @@ static void client_context_reset(Server *s, ClientContext *c) {
c->log_filter_allowed_patterns = set_free_free(c->log_filter_allowed_patterns);
c->log_filter_denied_patterns = set_free_free(c->log_filter_denied_patterns);
c->capability_quintet = CAPABILITY_QUINTET_NULL;
}
static ClientContext* client_context_free(Server *s, ClientContext *c) {
@ -233,8 +235,7 @@ static void client_context_read_basic(ClientContext *c) {
if (pid_get_cmdline(c->pid, SIZE_MAX, PROCESS_CMDLINE_QUOTE, &t) >= 0)
free_and_replace(c->cmdline, t);
if (get_process_capeff(c->pid, &t) >= 0)
free_and_replace(c->capeff, t);
(void) pidref_get_capability(&PIDREF_MAKE_FROM_PID(c->pid), &c->capability_quintet);
}
static int client_context_read_label(

View File

@ -7,6 +7,7 @@
#include "sd-id128.h"
#include "capability-util.h"
#include "set.h"
#include "time-util.h"
@ -27,7 +28,7 @@ struct ClientContext {
char *comm;
char *exe;
char *cmdline;
char *capeff;
CapabilityQuintet capability_quintet;
uint32_t auditid;
uid_t loginuid;

View File

@ -1109,7 +1109,7 @@ static void server_dispatch_message_real(
* Let's use a heap allocation for this one. */
cmdline1 = set_iovec_string_field(iovec, &n, "_CMDLINE=", c->cmdline);
IOVEC_ADD_STRING_FIELD(iovec, n, c->capeff, "_CAP_EFFECTIVE"); /* Read from /proc/.../status */
IOVEC_ADD_NUMERIC_FIELD(iovec, n, c->capability_quintet.effective, uint64_t, capability_is_set, "%" PRIx64, "_CAP_EFFECTIVE");
IOVEC_ADD_SIZED_FIELD(iovec, n, c->label, c->label_size, "_SELINUX_CONTEXT");
IOVEC_ADD_NUMERIC_FIELD(iovec, n, c->auditid, uint32_t, audit_session_is_valid, "%" PRIu32, "_AUDIT_SESSION");
IOVEC_ADD_NUMERIC_FIELD(iovec, n, c->loginuid, uid_t, uid_is_valid, UID_FMT, "_AUDIT_LOGINUID");
@ -1144,7 +1144,7 @@ static void server_dispatch_message_real(
if (o->cmdline)
cmdline2 = set_iovec_string_field(iovec, &n, "OBJECT_CMDLINE=", o->cmdline);
IOVEC_ADD_STRING_FIELD(iovec, n, o->capeff, "OBJECT_CAP_EFFECTIVE");
IOVEC_ADD_NUMERIC_FIELD(iovec, n, o->capability_quintet.effective, uint64_t, capability_is_set, "%" PRIx64, "OBJECT_CAP_EFFECTIVE");
IOVEC_ADD_SIZED_FIELD(iovec, n, o->label, o->label_size, "OBJECT_SELINUX_CONTEXT");
IOVEC_ADD_NUMERIC_FIELD(iovec, n, o->auditid, uint32_t, audit_session_is_valid, "%" PRIu32, "OBJECT_AUDIT_SESSION");
IOVEC_ADD_NUMERIC_FIELD(iovec, n, o->loginuid, uid_t, uid_is_valid, UID_FMT, "OBJECT_AUDIT_LOGINUID");

View File

@ -84,3 +84,19 @@ static inline char** generator_binary_paths(RuntimeScope runtime_scope) {
static inline char** env_generator_binary_paths(RuntimeScope runtime_scope) {
return generator_binary_paths_internal(runtime_scope, true);
}
static inline int credential_store_path(RuntimeScope runtime_scope, char ***ret) {
return sd_path_lookup_strv(
runtime_scope == RUNTIME_SCOPE_SYSTEM ?
SD_PATH_SYSTEM_SEARCH_CREDENTIAL_STORE : SD_PATH_USER_SEARCH_CREDENTIAL_STORE,
/* suffix= */ NULL,
ret);
}
static inline int credential_store_path_encrypted(RuntimeScope runtime_scope, char ***ret) {
return sd_path_lookup_strv(
runtime_scope == RUNTIME_SCOPE_SYSTEM ?
SD_PATH_SYSTEM_SEARCH_CREDENTIAL_STORE_ENCRYPTED : SD_PATH_USER_SEARCH_CREDENTIAL_STORE_ENCRYPTED,
/* suffix= */ NULL,
ret);
}

View File

@ -36,7 +36,12 @@ static int from_environment(const char *envname, const char *fallback, const cha
return -ENXIO;
}
static int from_home_dir(const char *envname, const char *suffix, char **buffer, const char **ret) {
static int from_home_dir(
const char *envname,
const char *suffix,
char **buffer,
const char **ret) {
_cleanup_free_ char *h = NULL;
int r;
@ -350,6 +355,30 @@ static int get_path(uint64_t type, char **buffer, const char **ret) {
case SD_PATH_SYSTEMD_USER_ENVIRONMENT_GENERATOR:
*ret = USER_ENV_GENERATOR_DIR;
return 0;
case SD_PATH_SYSTEM_CREDENTIAL_STORE:
*ret = "/etc/credstore";
return 0;
case SD_PATH_SYSTEM_CREDENTIAL_STORE_ENCRYPTED:
*ret = "/etc/credstore.encrypted";
return 0;
case SD_PATH_USER_CREDENTIAL_STORE:
r = xdg_user_config_dir("credstore", buffer);
if (r < 0)
return r;
*ret = *buffer;
return 0;
case SD_PATH_USER_CREDENTIAL_STORE_ENCRYPTED:
r = xdg_user_config_dir("credstore.encrypted", buffer);
if (r < 0)
return r;
*ret = *buffer;
return 0;
}
return -EOPNOTSUPP;
@ -366,12 +395,12 @@ static int get_path_alloc(uint64_t type, const char *suffix, char **ret) {
if (r < 0)
return r;
if (suffix) {
if (!isempty(suffix)) {
char *suffixed = path_join(p, suffix);
if (!suffixed)
return -ENOMEM;
path_simplify(suffixed);
path_simplify_full(suffixed, PATH_SIMPLIFY_KEEP_TRAILING_SLASH);
free_and_replace(buffer, suffixed);
} else if (!buffer) {
@ -601,8 +630,55 @@ static int get_search(uint64_t type, char ***ret) {
case SD_PATH_SYSTEMD_SEARCH_NETWORK:
return strv_from_nulstr(ret, NETWORK_DIRS_NULSTR);
case SD_PATH_SYSTEM_SEARCH_CREDENTIAL_STORE:
case SD_PATH_SYSTEM_SEARCH_CREDENTIAL_STORE_ENCRYPTED: {
const char *suffix =
type == SD_PATH_SYSTEM_SEARCH_CREDENTIAL_STORE_ENCRYPTED ? "credstore.encrypted" : "credstore";
_cleanup_strv_free_ char **l = NULL;
FOREACH_STRING(d, CONF_PATHS("")) {
char *j = path_join(d, suffix);
if (!j)
return -ENOMEM;
r = strv_consume(&l, TAKE_PTR(j));
if (r < 0)
return r;
}
*ret = TAKE_PTR(l);
return 0;
}
case SD_PATH_USER_SEARCH_CREDENTIAL_STORE:
case SD_PATH_USER_SEARCH_CREDENTIAL_STORE_ENCRYPTED: {
const char *suffix =
type == SD_PATH_USER_SEARCH_CREDENTIAL_STORE_ENCRYPTED ? "credstore.encrypted" : "credstore";
static const uint64_t dirs[] = {
SD_PATH_USER_CONFIGURATION,
SD_PATH_USER_RUNTIME,
SD_PATH_USER_LIBRARY_PRIVATE,
};
_cleanup_strv_free_ char **l = NULL;
FOREACH_ELEMENT(d, dirs) {
_cleanup_free_ char *p = NULL;
r = sd_path_lookup(*d, suffix, &p);
if (r == -ENXIO)
continue;
if (r < 0)
return r;
r = strv_consume(&l, TAKE_PTR(p));
if (r < 0)
return r;
}
*ret = TAKE_PTR(l);
return 0;
}}
return -EOPNOTSUPP;
}
@ -637,7 +713,7 @@ _public_ int sd_path_lookup_strv(uint64_t type, const char *suffix, char ***ret)
if (!path_extend(i, suffix))
return -ENOMEM;
path_simplify(*i);
path_simplify_full(*i, PATH_SIMPLIFY_KEEP_TRAILING_SLASH);
}
*ret = TAKE_PTR(l);

View File

@ -178,7 +178,7 @@ int bus_image_method_clone(
return sd_bus_error_set_errnof(error, r, "Failed to fork(): %m");
if (r == 0) {
errno_pipe_fd[0] = safe_close(errno_pipe_fd[0]);
r = image_clone(image, new_name, read_only);
r = image_clone(image, new_name, read_only, m->runtime_scope);
report_errno_and_exit(errno_pipe_fd[1], r);
}
@ -402,6 +402,7 @@ char* image_bus_path(const char *name) {
static int image_node_enumerator(sd_bus *bus, const char *path, void *userdata, char ***nodes, sd_bus_error *error) {
_cleanup_hashmap_free_ Hashmap *images = NULL;
_cleanup_strv_free_ char **l = NULL;
Manager *m = ASSERT_PTR(userdata);
Image *image;
int r;
@ -413,7 +414,7 @@ static int image_node_enumerator(sd_bus *bus, const char *path, void *userdata,
if (!images)
return -ENOMEM;
r = image_discover(IMAGE_MACHINE, NULL, images);
r = image_discover(m->runtime_scope, IMAGE_MACHINE, NULL, images);
if (r < 0)
return r;

View File

@ -148,7 +148,7 @@ int vl_method_clone_image(sd_varlink *link, sd_json_variant *parameters, sd_varl
return log_debug_errno(r, "Failed to fork: %m");
if (r == 0) {
errno_pipe_fd[0] = safe_close(errno_pipe_fd[0]);
r = image_clone(image, p.new_name, p.read_only > 0);
r = image_clone(image, p.new_name, p.read_only > 0, manager->runtime_scope);
report_errno_and_exit(errno_pipe_fd[1], r);
}

View File

@ -440,7 +440,7 @@ int manager_acquire_image(Manager *m, const char *name, Image **ret) {
return log_debug_errno(r, "Failed to enable source: %m") ;
_cleanup_(image_unrefp) Image *image = NULL;
r = image_find(IMAGE_MACHINE, name, NULL, &image);
r = image_find(m->runtime_scope, IMAGE_MACHINE, name, NULL, &image);
if (r < 0)
return log_debug_errno(r, "Failed to find image: %m");
@ -467,7 +467,7 @@ int rename_image_and_update_cache(Manager *m, Image *image, const char* new_name
/* The image is cached with its name, hence it is necessary to remove from the cache before renaming. */
assert_se(hashmap_remove_value(m->image_cache, image->name, image));
r = image_rename(image, new_name);
r = image_rename(image, new_name, m->runtime_scope);
if (r < 0) {
image = image_unref(image);
return r;

View File

@ -123,7 +123,7 @@ static int method_get_image(sd_bus_message *message, void *userdata, sd_bus_erro
if (r < 0)
return r;
r = image_find(IMAGE_MACHINE, name, NULL, NULL);
r = image_find(m->runtime_scope, IMAGE_MACHINE, name, NULL, NULL);
if (r == -ENOENT)
return sd_bus_error_setf(error, BUS_ERROR_NO_SUCH_IMAGE, "No image '%s' known", name);
if (r < 0)
@ -476,7 +476,7 @@ static int method_list_images(sd_bus_message *message, void *userdata, sd_bus_er
if (!images)
return -ENOMEM;
r = image_discover(IMAGE_MACHINE, NULL, images);
r = image_discover(m->runtime_scope, IMAGE_MACHINE, NULL, images);
if (r < 0)
return r;
@ -753,7 +753,7 @@ static int method_clean_pool(sd_bus_message *message, void *userdata, sd_bus_err
goto child_fail;
}
r = image_discover(IMAGE_MACHINE, NULL, images);
r = image_discover(m->runtime_scope, IMAGE_MACHINE, NULL, images);
if (r < 0)
goto child_fail;

View File

@ -641,6 +641,7 @@ static int list_image_one_and_maybe_read_metadata(sd_varlink *link, Image *image
}
static int vl_method_list_images(sd_varlink *link, sd_json_variant *parameters, sd_varlink_method_flags_t flags, void *userdata) {
Manager *m = ASSERT_PTR(userdata);
struct params {
const char *image_name;
AcquireMetadata acquire_metadata;
@ -667,7 +668,7 @@ static int vl_method_list_images(sd_varlink *link, sd_json_variant *parameters,
if (!image_name_is_valid(p.image_name))
return sd_varlink_error_invalid_parameter_name(link, "name");
r = image_find(IMAGE_MACHINE, p.image_name, /* root = */ NULL, &found);
r = image_find(m->runtime_scope, IMAGE_MACHINE, p.image_name, /* root = */ NULL, &found);
if (r == -ENOENT)
return sd_varlink_error(link, "io.systemd.MachineImage.NoSuchImage", NULL);
if (r < 0)
@ -683,7 +684,7 @@ static int vl_method_list_images(sd_varlink *link, sd_json_variant *parameters,
if (!images)
return -ENOMEM;
r = image_discover(IMAGE_MACHINE, /* root = */ NULL, images);
r = image_discover(m->runtime_scope, IMAGE_MACHINE, /* root = */ NULL, images);
if (r < 0)
return log_debug_errno(r, "Failed to discover images: %m");

View File

@ -40,10 +40,14 @@ static int manager_new(Manager **ret) {
assert(ret);
m = new0(Manager, 1);
m = new(Manager, 1);
if (!m)
return -ENOMEM;
*m = (Manager) {
.runtime_scope = RUNTIME_SCOPE_SYSTEM,
};
m->machines = hashmap_new(&machine_hash_ops);
if (!m->machines)
return -ENOMEM;

View File

@ -42,6 +42,8 @@ struct Manager {
sd_varlink_server *varlink_userdb_server;
sd_varlink_server *varlink_machine_server;
RuntimeScope runtime_scope; /* for now: always RUNTIME_SCOPE_SYSTEM */
};
int manager_add_machine(Manager *m, const char *name, Machine **ret);

View File

@ -67,4 +67,53 @@
<annotate key="org.freedesktop.policykit.imply">io.systemd.mount-file-system.mount-image-privately</annotate>
</action>
<!-- Allow mounting directories into the host user namespace -->
<action id="io.systemd.mount-file-system.mount-directory">
<!-- If the directory is owned by the user (or by the foreign UID range, with a parent
directory owned by the user), make little restrictions -->
<description gettext-domain="systemd">Allow mounting of directory</description>
<message gettext-domain="systemd">Authentication is required for an application to mount directory $(directory).</message>
<defaults>
<allow_any>auth_admin_keep</allow_any>
<allow_inactive>auth_admin_keep</allow_inactive>
<allow_active>yes</allow_active>
</defaults>
</action>
<action id="io.systemd.mount-file-system.mount-untrusted-directory">
<!-- If the directory is owned by an other user, require authentication -->
<description gettext-domain="systemd">Allow mounting of untrusted directory</description>
<message gettext-domain="systemd">Authentication is required for an application to mount directory $(directory) which is not owned by the user.</message>
<defaults>
<allow_any>auth_admin</allow_any>
<allow_inactive>auth_admin</allow_inactive>
<allow_active>auth_admin</allow_active>
</defaults>
<annotate key="org.freedesktop.policykit.imply">io.systemd.mount-file-system.mount-directory</annotate>
</action>
<!-- Allow mounting directories into a private user namespace -->
<action id="io.systemd.mount-file-system.mount-directory-privately">
<description gettext-domain="systemd">Allow private mounting of directory</description>
<message gettext-domain="systemd">Authentication is required for an application to privately mount directory $(directory).</message>
<defaults>
<allow_any>yes</allow_any>
<allow_inactive>yes</allow_inactive>
<allow_active>yes</allow_active>
</defaults>
</action>
<action id="io.systemd.mount-file-system.mount-untrusted-directory-privately">
<description gettext-domain="systemd">Allow private mounting of untrusted directory</description>
<message gettext-domain="systemd">Authentication is required for an application to privately mount directory $(directory) which is not owned by the user.</message>
<defaults>
<allow_any>auth_admin</allow_any>
<allow_inactive>auth_admin</allow_inactive>
<allow_active>auth_admin</allow_active>
</defaults>
<annotate key="org.freedesktop.policykit.imply">io.systemd.mount-file-system.mount-directory-privately</annotate>
</action>
</policyconfig>

View File

@ -1,5 +1,7 @@
/* SPDX-License-Identifier: LGPL-2.1-or-later */
#include <sys/mount.h>
#include "sd-daemon.h"
#include "sd-varlink.h"
@ -15,12 +17,17 @@
#include "json-util.h"
#include "main-func.h"
#include "missing_loop.h"
#include "missing_mount.h"
#include "missing_syscall.h"
#include "namespace-util.h"
#include "nsresource.h"
#include "nulstr-util.h"
#include "os-util.h"
#include "process-util.h"
#include "stat-util.h"
#include "string-table.h"
#include "uid-classification.h"
#include "uid-range.h"
#include "user-util.h"
#include "varlink-io.systemd.MountFileSystem.h"
#include "varlink-util.h"
@ -532,6 +539,342 @@ static int vl_method_mount_image(
SD_JSON_BUILD_PAIR_CONDITION(!sd_id128_is_null(di->image_uuid), "imageUuid", SD_JSON_BUILD_UUID(di->image_uuid)));
}
typedef enum MountMapMode {
MOUNT_MAP_ROOT, /* map caller's UID to root in namespace (map 1 UID only) */
MOUNT_MAP_FOREIGN, /* map foreign UID range to base in namespace (map 64K) */
MOUNT_MAP_IDENTITY, /* apply identity mapping (map 64K) */
MOUNT_MAP_AUTO, /* determine automatically from image and caller */
_MOUNT_MAP_MODE_MAX,
_MOUNT_MAP_MODE_INVALID = -EINVAL,
} MountMapMode;
static const char *const mount_map_mode_table[_MOUNT_MAP_MODE_MAX] = {
[MOUNT_MAP_ROOT] = "root",
[MOUNT_MAP_FOREIGN] = "foreign",
[MOUNT_MAP_IDENTITY] = "identity",
[MOUNT_MAP_AUTO] = "auto",
};
DEFINE_PRIVATE_STRING_TABLE_LOOKUP(mount_map_mode, MountMapMode);
typedef struct MountDirectoryParameters {
MountMapMode mode;
unsigned directory_fd_idx;
unsigned userns_fd_idx;
int read_only;
} MountDirectoryParameters;
enum {
DIRECTORY_IS_ROOT_PEER_OWNED, /* This is returned if the directory is owned by the root user and the peer is root */
DIRECTORY_IS_ROOT_OWNED, /* This is returned if the directory is owned by the root user (and the peer user is not root) */
DIRECTORY_IS_PEER_OWNED, /* This is returned if the directory is owned by the peer user (who is not root) */
DIRECTORY_IS_FOREIGN_OWNED, /* This is returned if the directory is owned by the foreign UID range */
DIRECTORY_IS_OTHERWISE_OWNED, /* This is returned if the directory is owned by something else */
};
static MountMapMode default_mount_map_mode(int ownership) {
/* Derives a suitable mapping mode from the ownership of the base tree */
switch (ownership) {
case DIRECTORY_IS_PEER_OWNED:
return MOUNT_MAP_ROOT; /* Map the peer's UID to root in the container */
case DIRECTORY_IS_FOREIGN_OWNED:
return MOUNT_MAP_FOREIGN; /* Map the foreign UID range to the container's UID range */
case DIRECTORY_IS_ROOT_PEER_OWNED:
case DIRECTORY_IS_ROOT_OWNED:
case DIRECTORY_IS_OTHERWISE_OWNED:
return MOUNT_MAP_IDENTITY; /* Don't map */
default:
return _MOUNT_MAP_MODE_INVALID;
}
}
static JSON_DISPATCH_ENUM_DEFINE(dispatch_mount_directory_mode, MountMapMode, mount_map_mode_from_string);
static int validate_directory_fd(int fd, uid_t peer_uid) {
int r, fl;
assert(fd >= 0);
/* Checks if the specified directory fd looks sane. Returns 1 if it's owned by the peer (or owned by
* the foreign UID range and within another dir owned by the peer). Returns 0 if it's owened by
* something else.
*
* Note one key difference to image validation (as implemented above): for regular files if the
* client provided us with an open fd it implies the client has access, as well as what kind of
* access (i.e. ro or rw). But for directories this doesn't work the same way, as directories are
* always opened read-only only. Hence we use a different mechanism to validate access to them: we
* check if the directory is owned by the peer UID or by the foreign UID range (in the latter case
* one of the parent directories must be owned by the peer though). */
struct stat st;
if (fstat(fd, &st) < 0)
return log_debug_errno(errno, "Failed to stat() directory fd: %m");
r = stat_verify_directory(&st);
if (r < 0)
return r;
fl = fd_verify_safe_flags_full(fd, O_DIRECTORY);
if (fl < 0)
return log_debug_errno(fl, "Directory file descriptor has unsafe flags set: %m");
if (st.st_uid == 0) {
if (peer_uid == 0) {
log_debug("Directory file descriptor points to root owned directory, who is also the peer.");
return DIRECTORY_IS_ROOT_PEER_OWNED;
}
log_debug("Directory file descriptor points to root owned directory.");
return DIRECTORY_IS_ROOT_OWNED;
}
if (st.st_uid == peer_uid) {
log_debug("Directory file descriptor points to peer owned directory.");
return DIRECTORY_IS_PEER_OWNED;
}
/* For bind mounted directories we check if they are either owned by the client's UID, or by the
* foreign UID set, but in that case the parent directory must be owned by the client's UID, or some
* directory iteratively up the chain */
_cleanup_close_ int parent_fd = -EBADF;
unsigned n_level;
for (n_level = 0; n_level < 16; n_level++) {
/* Stop iteration if we find a directory up the tree that is neither owned by the user, nor is from the foreign UID range */
if (!uid_is_foreign(st.st_uid) || !gid_is_foreign(st.st_gid)) {
log_debug("Directory file descriptor points to directory which itself or its parents is neither owned by foreign UID range nor by the user.");
return DIRECTORY_IS_OTHERWISE_OWNED;
}
/* If the peer is root, then it doesn't matter if we find a parent owned by root, let's shortcut things. */
if (peer_uid == 0) {
log_debug("Directory file descriptor is owned by foreign UID range, and peer is root.");
return DIRECTORY_IS_FOREIGN_OWNED;
}
/* Go one level up */
_cleanup_close_ int new_parent_fd = openat(fd, "..", O_DIRECTORY|O_PATH|O_CLOEXEC);
if (new_parent_fd < 0)
return log_debug_errno(errno, "Failed to open parent directory of directory file descriptor: %m");
struct stat new_st;
if (fstat(new_parent_fd, &new_st) < 0)
return log_debug_errno(errno, "Failed to stat parent directory of directory file descriptor: %m");
/* Safety check to see if we hit the root dir */
if (stat_inode_same(&st, &new_st)) {
log_debug("Directory file descriptor is owned by foreign UID range, but didn't find parent directory that is owned by peer among ancestors.");
return DIRECTORY_IS_OTHERWISE_OWNED;
}
if (new_st.st_uid == peer_uid) { /* Parent inode is owned by the peer. That's good! Everything's fine. */
log_debug("Directory file descriptor is owned by foreign UID range, and ancestor is owned by peer.");
return DIRECTORY_IS_FOREIGN_OWNED;
}
close_and_replace(parent_fd, new_parent_fd);
st = new_st;
}
log_debug("Failed to find peer owned parent directory after %u levels, refusing.", n_level);
return DIRECTORY_IS_OTHERWISE_OWNED;
}
static int vl_method_mount_directory(
sd_varlink *link,
sd_json_variant *parameters,
sd_varlink_method_flags_t flags,
void *userdata) {
static const sd_json_dispatch_field dispatch_table[] = {
{ "mode", SD_JSON_VARIANT_STRING, dispatch_mount_directory_mode, offsetof(MountDirectoryParameters, mode), 0 },
{ "directoryFileDescriptor", SD_JSON_VARIANT_UNSIGNED, sd_json_dispatch_uint, offsetof(MountDirectoryParameters, directory_fd_idx), SD_JSON_MANDATORY },
{ "userNamespaceFileDescriptor", SD_JSON_VARIANT_UNSIGNED, sd_json_dispatch_uint, offsetof(MountDirectoryParameters, userns_fd_idx), 0 },
{ "readOnly", SD_JSON_VARIANT_BOOLEAN, sd_json_dispatch_tristate, offsetof(MountDirectoryParameters, read_only), 0 },
VARLINK_DISPATCH_POLKIT_FIELD,
{}
};
MountDirectoryParameters p = {
.mode = _MOUNT_MAP_MODE_INVALID,
.directory_fd_idx = UINT_MAX,
.userns_fd_idx = UINT_MAX,
.read_only = -1,
};
_cleanup_close_ int directory_fd = -EBADF, userns_fd = -EBADF;
Hashmap **polkit_registry = ASSERT_PTR(userdata);
int r;
r = sd_varlink_dispatch(link, parameters, dispatch_table, &p);
if (r != 0)
return r;
if (p.directory_fd_idx != UINT_MAX) {
directory_fd = sd_varlink_peek_dup_fd(link, p.directory_fd_idx);
if (directory_fd < 0)
return log_debug_errno(directory_fd, "Failed to peek directory fd from client: %m");
}
if (p.userns_fd_idx != UINT_MAX) {
userns_fd = sd_varlink_peek_dup_fd(link, p.userns_fd_idx);
if (userns_fd < 0)
return log_debug_errno(userns_fd, "Failed to peek user namespace fd from client: %m");
}
uid_t peer_uid;
r = sd_varlink_get_peer_uid(link, &peer_uid);
if (r < 0)
return log_debug_errno(r, "Failed to get client UID: %m");
int owned_by = validate_directory_fd(directory_fd, peer_uid);
if (owned_by < 0)
return owned_by;
r = validate_userns(link, &userns_fd);
if (r != 0)
return r;
/* If no mode is specified, pick sensible default */
if (p.mode < 0 || p.mode == MOUNT_MAP_AUTO) {
p.mode = default_mount_map_mode(owned_by);
assert(p.mode >= 0 && p.mode != MOUNT_MAP_AUTO);
}
log_debug("Mount with mapping mode: %s", mount_map_mode_to_string(p.mode));
_cleanup_free_ char *directory_path = NULL;
(void) fd_get_path(directory_fd, &directory_path);
const char *polkit_details[] = {
"read_only", one_zero(p.read_only > 0),
"directory", strna(directory_path),
NULL,
};
const char *polkit_action, *polkit_untrusted_action;
PolkitFlags polkit_flags;
if (userns_fd < 0) {
/* Mount into the host user namespace */
polkit_action = "io.systemd.mount-file-system.mount-directory";
polkit_untrusted_action = "io.systemd.mount-file-system.mount-untrusted-directory";
polkit_flags = 0;
} else {
/* Mount into a private user namespace */
polkit_action = "io.systemd.mount-file-system.mount-directory-privately";
polkit_untrusted_action = "io.systemd.mount-file-system.mount-untrusted-directory-privately";
/* If polkit is not around, let's allow mounting authenticated images by default */
polkit_flags = POLKIT_DEFAULT_ALLOW;
}
/* We consider a directory "trusted" if it is owned by the peer or the foreign UID range */
bool trusted_directory = IN_SET(owned_by, DIRECTORY_IS_ROOT_PEER_OWNED, DIRECTORY_IS_PEER_OWNED, DIRECTORY_IS_FOREIGN_OWNED);
/* Let's definitely acquire the regular action privilege, for mounting properly signed images */
r = varlink_verify_polkit_async_full(
link,
/* bus= */ NULL,
trusted_directory ? polkit_action : polkit_untrusted_action,
polkit_details,
/* good_user= */ UID_INVALID,
trusted_directory ? polkit_flags : 0,
polkit_registry);
if (r <= 0)
return r;
r = get_common_dissect_directory(NULL);
if (r < 0)
return r;
_cleanup_close_ int mount_fd = open_tree(directory_fd, "", OPEN_TREE_CLONE|OPEN_TREE_CLOEXEC|AT_SYMLINK_NOFOLLOW|AT_EMPTY_PATH);
if (mount_fd < 0)
return log_debug_errno(errno, "Failed to issue open_tree() of provided directory: %m");
if (p.read_only > 0 && mount_setattr(
mount_fd, "", AT_EMPTY_PATH,
&(struct mount_attr) {
.attr_set = MOUNT_ATTR_RDONLY,
}, MOUNT_ATTR_SIZE_VER0) < 0)
return log_debug_errno(errno, "Failed to enable read-only mode: %m");
if (p.mode != MOUNT_MAP_IDENTITY) {
uid_t start;
if (userns_fd >= 0) {
_cleanup_(uid_range_freep) UIDRange *uid_range_outside = NULL, *uid_range_inside = NULL, *gid_range_outside = NULL, *gid_range_inside = NULL;
r = uid_range_load_userns_by_fd(userns_fd, UID_RANGE_USERNS_OUTSIDE, &uid_range_outside);
if (r < 0)
return log_debug_errno(r, "Failed to load outside UID range of provided userns: %m");
r = uid_range_load_userns_by_fd(userns_fd, UID_RANGE_USERNS_INSIDE, &uid_range_inside);
if (r < 0)
return log_debug_errno(r, "Failed to load inside UID range of provided userns: %m");
r = uid_range_load_userns_by_fd(userns_fd, GID_RANGE_USERNS_OUTSIDE, &gid_range_outside);
if (r < 0)
return log_debug_errno(r, "Failed to load outside GID range of provided userns: %m");
r = uid_range_load_userns_by_fd(userns_fd, GID_RANGE_USERNS_INSIDE, &gid_range_inside);
if (r < 0)
return log_debug_errno(r, "Failed to load inside GID range of provided userns: %m");
/* Be very strict for now */
if (!uid_range_equal(uid_range_outside, gid_range_outside) ||
!uid_range_equal(uid_range_inside, gid_range_inside) ||
uid_range_outside->n_entries != 1 ||
uid_range_outside->entries[0].nr != 0x10000 ||
uid_range_inside->n_entries != 1 ||
uid_range_inside->entries[0].start != 0 ||
uid_range_inside->entries[0].nr != 0x10000)
return sd_varlink_error_invalid_parameter_name(link, "userNamespaceFileDescriptor");
start = uid_range_outside->entries[0].start;
} else
start = 0;
_cleanup_free_ char *new_uid_map = NULL;
switch (p.mode) {
case MOUNT_MAP_ROOT:
r = strextendf(&new_uid_map, UID_FMT " " UID_FMT " " UID_FMT,
peer_uid, start, (uid_t) 1);
break;
case MOUNT_MAP_FOREIGN:
r = strextendf(&new_uid_map, UID_FMT " " UID_FMT " " UID_FMT,
(uid_t) FOREIGN_UID_MIN, start, (uid_t) 0x10000);
break;
default:
assert_not_reached();
}
_cleanup_close_ int idmap_userns_fd = userns_acquire(new_uid_map, new_uid_map);
if (idmap_userns_fd < 0)
return log_debug_errno(idmap_userns_fd, "Failed to acquire user namespace for id mapping: %m");
if (mount_setattr(mount_fd, "", AT_EMPTY_PATH,
&(struct mount_attr) {
.attr_set = MOUNT_ATTR_IDMAP,
.userns_fd = idmap_userns_fd,
.propagation = MS_PRIVATE,
}, MOUNT_ATTR_SIZE_VER0) < 0)
return log_debug_errno(errno, "Failed to enable id mapping: %m");
}
if (userns_fd >= 0) {
r = nsresource_add_mount(userns_fd, mount_fd);
if (r < 0)
return r;
}
int fd_idx = sd_varlink_push_fd(link, mount_fd);
if (fd_idx < 0)
return fd_idx;
TAKE_FD(mount_fd);
return sd_varlink_replybo(
link,
SD_JSON_BUILD_PAIR("mountFileDescriptor", SD_JSON_BUILD_INTEGER(fd_idx)));
}
static int process_connection(sd_varlink_server *server, int _fd) {
_cleanup_close_ int fd = TAKE_FD(_fd); /* always take possession */
_cleanup_(sd_varlink_close_unrefp) sd_varlink *vl = NULL;
@ -606,7 +949,8 @@ static int run(int argc, char *argv[]) {
r = sd_varlink_server_bind_method_many(
server,
"io.systemd.MountFileSystem.MountImage", vl_method_mount_image);
"io.systemd.MountFileSystem.MountImage", vl_method_mount_image,
"io.systemd.MountFileSystem.MountDirectory", vl_method_mount_directory);
if (r < 0)
return log_error_errno(r, "Failed to bind methods: %m");

View File

@ -7,7 +7,6 @@ libnspawn_core_sources = files(
'nspawn-mount.c',
'nspawn-network.c',
'nspawn-oci.c',
'nspawn-patch-uid.c',
'nspawn-register.c',
'nspawn-seccomp.c',
'nspawn-settings.c',

View File

@ -231,7 +231,7 @@ int bind_user_prepare(
_cleanup_(group_record_unrefp) GroupRecord *g = NULL, *cg = NULL;
_cleanup_free_ char *sm = NULL, *sd = NULL;
r = userdb_by_name(*n, USERDB_DONT_SYNTHESIZE, &u);
r = userdb_by_name(*n, USERDB_DONT_SYNTHESIZE_INTRINSIC|USERDB_DONT_SYNTHESIZE_FOREIGN, &u);
if (r < 0)
return log_error_errno(r, "Failed to resolve user '%s': %m", *n);
@ -252,7 +252,7 @@ int bind_user_prepare(
if (u->uid >= uid_shift && u->uid < uid_shift + uid_range)
return log_error_errno(SYNTHETIC_ERRNO(EINVAL), "UID of user '%s' to map is already in container UID range, refusing.", u->user_name);
r = groupdb_by_gid(u->gid, USERDB_DONT_SYNTHESIZE, &g);
r = groupdb_by_gid(u->gid, USERDB_DONT_SYNTHESIZE_INTRINSIC|USERDB_DONT_SYNTHESIZE_FOREIGN, &g);
if (r < 0)
return log_error_errno(r, "Failed to resolve group of user '%s': %m", u->user_name);

View File

@ -117,7 +117,7 @@ int create_subcgroup(
CGroupUnified unified_requested,
uid_t uid_shift,
int userns_fd,
bool privileged) {
UserNamespaceMode userns_mode) {
_cleanup_free_ char *cgroup = NULL, *payload = NULL;
CGroupMask supported;
@ -161,14 +161,14 @@ int create_subcgroup(
if (!payload)
return log_oom();
if (privileged)
if (userns_mode != USER_NAMESPACE_MANAGED)
r = cg_create_and_attach(SYSTEMD_CGROUP_CONTROLLER, payload, pid);
else
r = cg_create(SYSTEMD_CGROUP_CONTROLLER, payload);
if (r < 0)
return log_error_errno(r, "Failed to create %s subcgroup: %m", payload);
if (privileged) {
if (userns_mode != USER_NAMESPACE_MANAGED) {
_cleanup_free_ char *fs = NULL;
r = cg_get_path(SYSTEMD_CGROUP_CONTROLLER, payload, NULL, &fs);
if (r < 0)

View File

@ -5,9 +5,10 @@
#include <sys/types.h>
#include "cgroup-util.h"
#include "nspawn-settings.h"
int sync_cgroup(pid_t pid, CGroupUnified unified_requested, uid_t uid_shift);
int create_subcgroup(pid_t pid, bool keep_unit, CGroupUnified unified_requested, uid_t uid_shift, int userns_fd, bool privileged);
int create_subcgroup(pid_t pid, bool keep_unit, CGroupUnified unified_requested, uid_t uid_shift, int userns_fd, UserNamespaceMode userns_mode);
int mount_cgroups(const char *dest, CGroupUnified unified_requested, bool userns, uid_t uid_shift, uid_t uid_range, const char *selinux_apifs_context, bool use_cgns);
int mount_systemd_cgroup_writable(const char *dest, CGroupUnified unified_requested);

View File

@ -1,9 +0,0 @@
/* SPDX-License-Identifier: LGPL-2.1-or-later */
#pragma once
#include <sys/types.h>
/* While we are chmod()ing a directory tree, we set the top-level UID base to this "busy" base, so that we can always
* recognize trees we are were chmod()ing recursively and got interrupted in */
#define UID_BUSY_BASE ((uid_t) UINT32_C(0xFFFE0000))
#define UID_BUSY_MASK ((uid_t) UINT32_C(0xFFFF0000))

View File

@ -475,7 +475,8 @@ int mount_sysfs(const char *dest, MountSettingsMask mount_settings) {
if (!full)
return log_oom();
(void) mkdir(full, 0755);
if (mkdir(full, 0755) < 0 && errno != EEXIST)
return log_error_errno(errno, "Failed to create directory '%s': %m", full);
if (FLAGS_SET(mount_settings, MOUNT_APPLY_APIVFS_RO))
extra_flags |= MS_RDONLY;
@ -594,11 +595,11 @@ int mount_all(const char *dest,
{ "tmpfs", "/tmp", "tmpfs", "mode=01777" NESTED_TMPFS_LIMITS, MS_NOSUID|MS_NODEV|MS_STRICTATIME,
MOUNT_FATAL|MOUNT_APPLY_TMPFS_TMP|MOUNT_MKDIR },
{ "tmpfs", "/sys", "tmpfs", "mode=0555" TMPFS_LIMITS_SYS, MS_NOSUID|MS_NOEXEC|MS_NODEV,
MOUNT_FATAL|MOUNT_APPLY_APIVFS_NETNS|MOUNT_MKDIR|MOUNT_PRIVILEGED },
MOUNT_FATAL|MOUNT_APPLY_APIVFS_NETNS|MOUNT_MKDIR|MOUNT_UNMANAGED },
{ "sysfs", "/sys", "sysfs", NULL, SYS_DEFAULT_MOUNT_FLAGS,
MOUNT_FATAL|MOUNT_APPLY_APIVFS_RO|MOUNT_MKDIR|MOUNT_PRIVILEGED }, /* skipped if above was mounted */
MOUNT_FATAL|MOUNT_APPLY_APIVFS_RO|MOUNT_MKDIR|MOUNT_UNMANAGED }, /* skipped if above was mounted */
{ "sysfs", "/sys", "sysfs", NULL, MS_NOSUID|MS_NOEXEC|MS_NODEV,
MOUNT_FATAL|MOUNT_MKDIR|MOUNT_PRIVILEGED }, /* skipped if above was mounted */
MOUNT_FATAL|MOUNT_MKDIR|MOUNT_UNMANAGED }, /* skipped if above was mounted */
{ "tmpfs", "/dev", "tmpfs", "mode=0755" TMPFS_LIMITS_PRIVATE_DEV, MS_NOSUID|MS_STRICTATIME,
MOUNT_FATAL|MOUNT_MKDIR },
{ "tmpfs", "/dev/shm", "tmpfs", "mode=01777" NESTED_TMPFS_LIMITS, MS_NOSUID|MS_NODEV|MS_STRICTATIME,
@ -621,9 +622,9 @@ int mount_all(const char *dest,
{ "/sys/fs/selinux", "/sys/fs/selinux", NULL, NULL, MS_BIND,
MOUNT_MKDIR|MOUNT_PRIVILEGED }, /* Bind mount first (mkdir/chown the mount point in case /sys/ is mounted as minimal skeleton tmpfs) */
{ NULL, "/sys/fs/selinux", NULL, NULL, MS_BIND|MS_RDONLY|MS_NOSUID|MS_NOEXEC|MS_NODEV|MS_REMOUNT,
MOUNT_PRIVILEGED }, /* Then, make it r/o (don't mkdir/chown the mount point here, the previous entry already did that) */
MOUNT_UNMANAGED|MOUNT_PRIVILEGED }, /* Then, make it r/o (don't mkdir/chown the mount point here, the previous entry already did that) */
{ NULL, "/sys/fs/selinux", NULL, NULL, MS_PRIVATE,
MOUNT_PRIVILEGED }, /* Turn off propagation (we only want that for the mount propagation tunnel dir) */
MOUNT_UNMANAGED|MOUNT_PRIVILEGED }, /* Turn off propagation (we only want that for the mount propagation tunnel dir) */
#endif
};
@ -632,6 +633,7 @@ int mount_all(const char *dest,
bool ro = FLAGS_SET(mount_settings, MOUNT_APPLY_APIVFS_RO);
bool in_userns = FLAGS_SET(mount_settings, MOUNT_IN_USERNS);
bool tmpfs_tmp = FLAGS_SET(mount_settings, MOUNT_APPLY_TMPFS_TMP);
bool unmanaged = FLAGS_SET(mount_settings, MOUNT_UNMANAGED);
bool privileged = FLAGS_SET(mount_settings, MOUNT_PRIVILEGED);
int r;
@ -641,7 +643,7 @@ int mount_all(const char *dest,
const char *o;
/* If we are not privileged but the entry is marked as privileged and to be mounted outside the user namespace, then skip it */
if (!privileged && FLAGS_SET(m->mount_settings, MOUNT_PRIVILEGED) && !FLAGS_SET(m->mount_settings, MOUNT_IN_USERNS))
if (!unmanaged && FLAGS_SET(m->mount_settings, MOUNT_UNMANAGED) && !FLAGS_SET(m->mount_settings, MOUNT_IN_USERNS))
continue;
if (in_userns != FLAGS_SET(m->mount_settings, MOUNT_IN_USERNS))
@ -656,6 +658,9 @@ int mount_all(const char *dest,
if (!tmpfs_tmp && FLAGS_SET(m->mount_settings, MOUNT_APPLY_TMPFS_TMP))
continue;
if (!privileged && FLAGS_SET(m->mount_settings, MOUNT_PRIVILEGED))
continue;
r = chase(m->where, dest, CHASE_NONEXISTENT|CHASE_PREFIX_ROOT, &where, NULL);
if (r < 0)
return log_error_errno(r, "Failed to resolve %s%s: %m", strempty(dest), m->where);
@ -1405,9 +1410,11 @@ done:
#define NSPAWN_PRIVATE_FULLY_VISIBLE_PROCFS "/run/host/proc"
#define NSPAWN_PRIVATE_FULLY_VISIBLE_SYSFS "/run/host/sys"
int pin_fully_visible_fs(void) {
int pin_fully_visible_api_fs(void) {
int r;
log_debug("Pinning fully visible API FS");
(void) mkdir_p(NSPAWN_PRIVATE_FULLY_VISIBLE_PROCFS, 0755);
(void) mkdir_p(NSPAWN_PRIVATE_FULLY_VISIBLE_SYSFS, 0755);
@ -1422,7 +1429,7 @@ int pin_fully_visible_fs(void) {
return 0;
}
static int do_wipe_fully_visible_fs(void) {
static int do_wipe_fully_visible_api_fs(void) {
if (umount2(NSPAWN_PRIVATE_FULLY_VISIBLE_PROCFS, MNT_DETACH) < 0)
return log_error_errno(errno, "Failed to unmount temporary proc: %m");
@ -1438,10 +1445,12 @@ static int do_wipe_fully_visible_fs(void) {
return 0;
}
int wipe_fully_visible_fs(int mntns_fd) {
int wipe_fully_visible_api_fs(int mntns_fd) {
_cleanup_close_ int orig_mntns_fd = -EBADF;
int r, rr;
log_debug("Wiping fully visible API FS");
r = namespace_open(0,
/* ret_pidns_fd = */ NULL,
&orig_mntns_fd,
@ -1459,7 +1468,7 @@ int wipe_fully_visible_fs(int mntns_fd) {
if (r < 0)
return log_error_errno(r, "Failed to enter mount namespace: %m");
rr = do_wipe_fully_visible_fs();
rr = do_wipe_fully_visible_api_fs();
r = namespace_enter(/* pidns_fd = */ -EBADF,
orig_mntns_fd,

View File

@ -20,7 +20,8 @@ typedef enum MountSettingsMask {
MOUNT_TOUCH = 1 << 9, /* if set, touch file to mount over first */
MOUNT_PREFIX_ROOT = 1 << 10,/* if set, prefix the source path with the container's root directory */
MOUNT_FOLLOW_SYMLINKS = 1 << 11,/* if set, we'll follow symlinks for the mount target */
MOUNT_PRIVILEGED = 1 << 12,/* if set, we'll only mount this in the outer child if we are running in privileged mode */
MOUNT_UNMANAGED = 1 << 12,/* if set, we'll only mount this in the outer child if we are running in privileged mode */
MOUNT_PRIVILEGED = 1 << 13,/* if set, we'll only mount this if we have full privileges */
} MountSettingsMask;
typedef enum CustomMountType {
@ -73,5 +74,6 @@ int pivot_root_parse(char **pivot_root_new, char **pivot_root_old, const char *s
int setup_pivot_root(const char *directory, const char *pivot_root_new, const char *pivot_root_old);
int tmpfs_patch_options(const char *options,uid_t uid_shift, const char *selinux_apifs_context, char **ret);
int pin_fully_visible_fs(void);
int wipe_fully_visible_fs(int mntns_fd);
int pin_fully_visible_api_fs(void);
int wipe_fully_visible_api_fs(int mntns_fd);

View File

@ -933,6 +933,7 @@ static const char *const user_namespace_ownership_table[_USER_NAMESPACE_OWNERSHI
[USER_NAMESPACE_OWNERSHIP_OFF] = "off",
[USER_NAMESPACE_OWNERSHIP_CHOWN] = "chown",
[USER_NAMESPACE_OWNERSHIP_MAP] = "map",
[USER_NAMESPACE_OWNERSHIP_FOREIGN] = "foreign",
[USER_NAMESPACE_OWNERSHIP_AUTO] = "auto",
};

View File

@ -29,14 +29,16 @@ typedef enum UserNamespaceMode {
USER_NAMESPACE_NO,
USER_NAMESPACE_FIXED,
USER_NAMESPACE_PICK,
USER_NAMESPACE_MANAGED,
_USER_NAMESPACE_MODE_MAX,
_USER_NAMESPACE_MODE_INVALID = -EINVAL,
} UserNamespaceMode;
typedef enum UserNamespaceOwnership {
USER_NAMESPACE_OWNERSHIP_OFF,
USER_NAMESPACE_OWNERSHIP_CHOWN,
USER_NAMESPACE_OWNERSHIP_MAP,
USER_NAMESPACE_OWNERSHIP_OFF, /* do not change ownership */
USER_NAMESPACE_OWNERSHIP_CHOWN, /* chown to target range */
USER_NAMESPACE_OWNERSHIP_MAP, /* map from 0x00000000…0x0000FFFF range to target range */
USER_NAMESPACE_OWNERSHIP_FOREIGN, /* map from 0x7FFE0000…0x7FFEFFFF range to target range */
USER_NAMESPACE_OWNERSHIP_AUTO,
_USER_NAMESPACE_OWNERSHIP_MAX,
_USER_NAMESPACE_OWNERSHIP_INVALID = -1,

View File

@ -42,6 +42,7 @@
#include "copy.h"
#include "cpu-set-util.h"
#include "dev-setup.h"
#include "devnum-util.h"
#include "discover-image.h"
#include "dissect-image.h"
#include "env-util.h"
@ -73,7 +74,6 @@
#include "nspawn-mount.h"
#include "nspawn-network.h"
#include "nspawn-oci.h"
#include "nspawn-patch-uid.h"
#include "nspawn-register.h"
#include "nspawn-seccomp.h"
#include "nspawn-settings.h"
@ -96,6 +96,7 @@
#include "rlimit-util.h"
#include "rm-rf.h"
#include "seccomp-util.h"
#include "shift-uid.h"
#include "signal-util.h"
#include "socket-util.h"
#include "stat-util.h"
@ -106,6 +107,7 @@
#include "sysctl-util.h"
#include "terminal-util.h"
#include "tmpfile-util.h"
#include "uid-classification.h"
#include "umask-util.h"
#include "unit-name.h"
#include "user-util.h"
@ -138,7 +140,7 @@ static char *arg_hostname = NULL; /* The name the payload sees by default */
static const char *arg_selinux_context = NULL;
static const char *arg_selinux_apifs_context = NULL;
static char *arg_slice = NULL;
static bool arg_private_network = false;
static bool arg_private_network; /* initialized depending on arg_privileged in run() */
static bool arg_read_only = false;
static StartMode arg_start_mode = START_PID1;
static bool arg_ephemeral = false;
@ -196,7 +198,7 @@ static VolatileMode arg_volatile_mode = VOLATILE_NO;
static ExposePort *arg_expose_ports = NULL;
static char **arg_property = NULL;
static sd_bus_message *arg_property_message = NULL;
static UserNamespaceMode arg_userns_mode = USER_NAMESPACE_NO;
static UserNamespaceMode arg_userns_mode; /* initialized depending on arg_privileged in run() */
static uid_t arg_uid_shift = UID_INVALID, arg_uid_range = 0x10000U;
static UserNamespaceOwnership arg_userns_ownership = _USER_NAMESPACE_OWNERSHIP_INVALID;
static int arg_kill_signal = 0;
@ -368,7 +370,7 @@ static int help(void) {
" the service unit nspawn is running in\n"
"\n%3$sUser Namespacing:%4$s\n"
" --private-users=no Run without user namespacing\n"
" --private-users=yes|pick|identity\n"
" --private-users=yes|pick|identity|managed\n"
" Run within user namespace, autoselect UID/GID range\n"
" --private-users=UIDBASE[:NUIDS]\n"
" Similar, but with user configured UID/GID range\n"
@ -517,7 +519,7 @@ static int detect_unified_cgroup_hierarchy_from_environment(void) {
static int detect_unified_cgroup_hierarchy_from_image(const char *directory) {
int r;
if (!arg_privileged) {
if (arg_userns_mode == USER_NAMESPACE_MANAGED) {
/* We only support the unified mode when running unprivileged */
arg_unified_cgroup_hierarchy = CGROUP_UNIFIED_ALL;
return 0;
@ -1250,12 +1252,17 @@ static int parse_argv(int argc, char *argv[]) {
arg_uid_range = UINT32_C(0x10000);
} else if (streq(optarg, "identity")) {
/* identity: User namespaces on, UID range is map the 0…0xFFFF range to
/* identity: User namespaces on, UID range is map of the 0…0xFFFF range to
* itself, i.e. we don't actually map anything, but do take benefit of
* isolation of capability sets. */
arg_userns_mode = USER_NAMESPACE_FIXED;
arg_uid_shift = 0;
arg_uid_range = UINT32_C(0x10000);
} else if (streq(optarg, "managed")) {
/* managed: User namespace on, and acquire it from systemd-nsresourced */
arg_userns_mode = USER_NAMESPACE_MANAGED;
arg_uid_shift = UID_INVALID;
arg_uid_range = UINT32_C(0x10000);
} else {
/* anything else: User namespacing on, UID range is explicitly configured */
r = parse_userns_uid_range(optarg, &arg_uid_shift, &arg_uid_range);
@ -1270,9 +1277,8 @@ static int parse_argv(int argc, char *argv[]) {
case 'U':
if (userns_supported()) {
arg_userns_mode = USER_NAMESPACE_PICK; /* Note that arg_userns_ownership is
* implied by USER_NAMESPACE_PICK
* further down. */
/* Note that arg_userns_ownership is implied by USER_NAMESPACE_PICK further down. */
arg_userns_mode = arg_privileged ? USER_NAMESPACE_PICK : USER_NAMESPACE_MANAGED;
arg_uid_shift = UID_INVALID;
arg_uid_range = UINT32_C(0x10000);
@ -1655,14 +1661,23 @@ static int parse_argv(int argc, char *argv[]) {
static int verify_arguments(void) {
int r;
SET_FLAG(arg_mount_settings, MOUNT_PRIVILEGED, arg_privileged);
SET_FLAG(arg_mount_settings, MOUNT_UNMANAGED, arg_userns_mode != USER_NAMESPACE_MANAGED);
if (!arg_privileged) {
if (!arg_private_network) {
log_notice("Automatically implying --private-network, since mounting /sys/ in an unprivileged user namespaces requires network namespacing.");
arg_private_network = true;
}
}
/* We can mount selinuxfs only if we are privileged and can do so before userns. In managed mode we
* have to enter the userns earlier, hence cannot do that. */
/* SET_FLAG(arg_mount_settings, MOUNT_PRIVILEGED, arg_privileged); */
SET_FLAG(arg_mount_settings, MOUNT_PRIVILEGED, arg_userns_mode != USER_NAMESPACE_MANAGED);
SET_FLAG(arg_mount_settings, MOUNT_USE_USERNS, arg_userns_mode != USER_NAMESPACE_NO);
if (arg_private_network)
SET_FLAG(arg_mount_settings, MOUNT_APPLY_APIVFS_NETNS, arg_private_network);
if (!arg_privileged && arg_userns_mode != USER_NAMESPACE_MANAGED)
return log_error_errno(SYNTHETIC_ERRNO(EINVAL), "Unprivileged operation requires managed user namespaces, as otherwise no UID range can be acquired.");
if (arg_userns_mode == USER_NAMESPACE_MANAGED && !arg_private_network)
return log_error_errno(SYNTHETIC_ERRNO(EINVAL), "Unprivileged operation requires private networking, as otherwise /sys/ may not be mounted.");
if (arg_start_mode == START_PID2 && arg_unified_cgroup_hierarchy == CGROUP_UNIFIED_UNKNOWN) {
/* If we are running the stub init in the container, we don't need to look at what the init
@ -1683,12 +1698,6 @@ static int verify_arguments(void) {
arg_unified_cgroup_hierarchy = CGROUP_UNIFIED_NONE;
}
if (arg_userns_mode != USER_NAMESPACE_NO)
arg_mount_settings |= MOUNT_USE_USERNS;
if (arg_private_network)
arg_mount_settings |= MOUNT_APPLY_APIVFS_NETNS;
if (!(arg_clone_ns_flags & CLONE_NEWPID) ||
!(arg_clone_ns_flags & CLONE_NEWUTS)) {
arg_register = false;
@ -1698,8 +1707,7 @@ static int verify_arguments(void) {
if (arg_userns_ownership < 0)
arg_userns_ownership =
arg_userns_mode == USER_NAMESPACE_PICK ? USER_NAMESPACE_OWNERSHIP_AUTO :
USER_NAMESPACE_OWNERSHIP_OFF;
IN_SET(arg_userns_mode, USER_NAMESPACE_PICK, USER_NAMESPACE_MANAGED) ? USER_NAMESPACE_OWNERSHIP_AUTO : USER_NAMESPACE_OWNERSHIP_OFF;
if (arg_start_mode == START_BOOT && arg_kill_signal <= 0)
arg_kill_signal = SIGRTMIN+3;
@ -1762,7 +1770,7 @@ static int verify_arguments(void) {
return log_error_errno(SYNTHETIC_ERRNO(EINVAL), "Invalid namespacing settings. Mounting sysfs with --private-users requires --private-network.");
if (arg_userns_mode != USER_NAMESPACE_NO && !(arg_mount_settings & MOUNT_APPLY_APIVFS_RO))
return log_error_errno(SYNTHETIC_ERRNO(EINVAL), "Cannot combine --private-users with read-write mounts.");
return log_error_errno(SYNTHETIC_ERRNO(EINVAL), "Cannot combine --private-users with read-write API VFS mounts.");
if (arg_expose_ports && !arg_private_network)
return log_error_errno(SYNTHETIC_ERRNO(EINVAL), "Cannot use --port= without private networking.");
@ -1808,10 +1816,18 @@ static int verify_network_interfaces_initialized(void) {
return 0;
}
int userns_lchown(const char *p, uid_t uid, gid_t gid) {
assert(p);
static int in_child_chown(void) {
/* Returns true when chown()ing inodes we create inside the outer child is required. Basically, we
* need the chowning when we implement userns ourselves. If userns is off we don#t need to chown(),
* obviously. And if we are in managed mode we already entered the userns, and hence don#t need to
* manually chown either. */
return IN_SET(arg_userns_mode, USER_NAMESPACE_PICK, USER_NAMESPACE_FIXED);
}
if (arg_userns_mode == USER_NAMESPACE_NO)
static int userns_chown_at(int fd, const char *fname, uid_t uid, gid_t gid, int flags) {
assert(fd >= 0 || fd == AT_FDCWD);
if (!in_child_chown())
return 0;
if (uid == UID_INVALID && gid == GID_INVALID)
@ -1831,21 +1847,31 @@ int userns_lchown(const char *p, uid_t uid, gid_t gid) {
return -EOVERFLOW;
}
return RET_NERRNO(lchown(p, uid, gid));
return RET_NERRNO(fchownat(fd, strempty(fname), uid, gid, flags));
}
int userns_lchown(const char *path, uid_t uid, gid_t gid) {
return userns_chown_at(AT_FDCWD, path, uid, gid, AT_SYMLINK_NOFOLLOW);
}
int userns_mkdir(const char *root, const char *path, mode_t mode, uid_t uid, gid_t gid) {
const char *q;
int r;
q = prefix_roota(root, path);
r = RET_NERRNO(mkdir(q, mode));
if (r == -EEXIST)
return 0;
assert(path);
_cleanup_close_ int parent_fd = -EBADF;
_cleanup_free_ char *dname = NULL;
r = chase(path, root, CHASE_PARENT|CHASE_PREFIX_ROOT|CHASE_EXTRACT_FILENAME, &dname, &parent_fd);
if (r < 0)
return r;
return userns_lchown(q, uid, gid);
_cleanup_close_ int dir_fd = open_mkdir_at(parent_fd, dname, O_EXCL|O_CLOEXEC, mode);
if (dir_fd == -EEXIST)
return 0;
if (dir_fd < 0)
return dir_fd;
return userns_chown_at(dir_fd, /* fname= */ NULL, uid, gid, AT_SYMLINK_NOFOLLOW|AT_EMPTY_PATH);
}
static const char *timezone_from_path(const char *path) {
@ -2284,18 +2310,24 @@ static int copy_devnode_one(const char *dest, const char *node, bool ignore_mkno
if (r < 0)
return log_error_errno(r, "Failed to create directory %s: %m", parent);
if (mknod(to, st.st_mode, st.st_rdev) < 0) {
r = -errno; /* Save the original error code. */
r = RET_NERRNO(mknod(to, st.st_mode, st.st_rdev));
if (r < 0) {
/* Explicitly warn the user when /dev/ is already populated. */
if (r == -EEXIST)
log_notice("%s/dev/ is pre-mounted and pre-populated. If a pre-mounted /dev/ is provided it needs to be an unpopulated file system.", dest);
/* If arg_uid_shift != 0, then we cannot fall back to use bind mount. */
if (arg_uid_shift != 0) {
if (!(arg_userns_mode == USER_NAMESPACE_NO ||
(arg_userns_mode == USER_NAMESPACE_FIXED && arg_uid_shift == 0))) {
if (ignore_mknod_failure) {
log_debug_errno(r, "Failed to mknod(%s), ignoring: %m", to);
return 0;
}
if (arg_userns_mode != USER_NAMESPACE_MANAGED || !ERRNO_IS_NEG_PRIVILEGE(r))
return log_error_errno(r, "Failed to mknod(%s): %m", to);
log_debug_errno(r, "Failed to create device node '%s' and running in managed mode, resorting to bind mount: %m", to);
}
/* Some systems abusively restrict mknod but allow bind mounts. */
@ -2323,7 +2355,7 @@ static int copy_devnode_one(const char *dest, const char *node, bool ignore_mkno
return log_error_errno(r, "Failed to create '%s': %m", dn);
_cleanup_free_ char *sl = NULL;
if (asprintf(&sl, "%s/%u:%u", dn, major(st.st_rdev), minor(st.st_rdev)) < 0)
if (asprintf(&sl, "%s/" DEVNUM_FORMAT_STR, dn, DEVNUM_FORMAT_VAL(st.st_rdev)) < 0)
return log_oom();
_cleanup_free_ char *prefixed = path_join(dest, sl);
@ -2391,7 +2423,7 @@ static int make_extra_nodes(const char *dest) {
return 0;
}
static int setup_pts(const char *dest) {
static int setup_pts(const char *dest, uid_t chown_uid) {
_cleanup_free_ char *options = NULL;
const char *p;
int r;
@ -2400,13 +2432,13 @@ static int setup_pts(const char *dest) {
if (arg_selinux_apifs_context)
(void) asprintf(&options,
"newinstance,ptmxmode=0666,mode=" STRINGIFY(TTY_MODE) ",gid=" GID_FMT ",context=\"%s\"",
arg_uid_shift + TTY_GID,
chown_uid + TTY_GID,
arg_selinux_apifs_context);
else
#endif
(void) asprintf(&options,
"newinstance,ptmxmode=0666,mode=" STRINGIFY(TTY_MODE) ",gid=" GID_FMT,
arg_uid_shift + TTY_GID);
chown_uid + TTY_GID);
if (!options)
return log_oom();
@ -2568,11 +2600,10 @@ static int setup_credentials(const char *root) {
if (fchmod(fd, world_readable ? 0444 : 0400) < 0)
return log_error_errno(errno, "Failed to adjust access mode of %s: %m", j);
if (arg_userns_mode != USER_NAMESPACE_NO) {
if (arg_userns_mode != USER_NAMESPACE_NO)
if (fchown(fd, arg_uid_shift, arg_uid_shift) < 0)
return log_error_errno(errno, "Failed to adjust ownership of %s: %m", j);
}
}
if (chmod(q, world_readable ? 0555 : 0500) < 0)
return log_error_errno(errno, "Failed to adjust access mode of %s: %m", q);
@ -2844,10 +2875,12 @@ static int reset_audit_loginuid(void) {
if ((arg_clone_ns_flags & CLONE_NEWPID) == 0)
return 0;
if (!arg_privileged)
/* if we are in managed userns mode, then we are already in our userns, hence we cannot reset the
* loginuid anyway, hence don't bother */
if (arg_userns_mode == USER_NAMESPACE_MANAGED)
return 0;
r = read_one_line_file("/proc/self/loginuid", &p);
r = read_virtual_file("/proc/self/loginuid", SIZE_MAX, &p, /* ret_size= */ NULL);
if (r == -ENOENT)
return 0;
if (r < 0)
@ -2876,7 +2909,7 @@ static int mount_tunnel_dig(const char *root) {
const char *p, *q;
int r;
if (!arg_privileged) {
if (arg_userns_mode == USER_NAMESPACE_MANAGED) {
log_debug("Not digging mount tunnel, because running unprivileged.");
return 0;
}
@ -2909,7 +2942,7 @@ static int mount_tunnel_dig(const char *root) {
static int mount_tunnel_open(void) {
int r;
if (!arg_privileged) {
if (arg_userns_mode == USER_NAMESPACE_MANAGED) {
log_debug("Not opening up mount tunnel, because running unprivileged.");
return 0;
}
@ -3166,7 +3199,8 @@ static int determine_names(void) {
if (arg_machine) {
_cleanup_(image_unrefp) Image *i = NULL;
r = image_find(IMAGE_MACHINE, arg_machine, NULL, &i);
r = image_find(arg_privileged ? RUNTIME_SCOPE_SYSTEM : RUNTIME_SCOPE_USER,
IMAGE_MACHINE, arg_machine, NULL, &i);
if (r == -ENOENT)
return log_error_errno(r, "No image for machine '%s'.", arg_machine);
if (r < 0)
@ -3254,6 +3288,13 @@ static int chase_and_update(char **p, unsigned flags) {
}
static int determine_uid_shift(const char *directory) {
assert(directory);
if (arg_userns_mode == USER_NAMESPACE_MANAGED) {
/* In managed mode we should already know the UID shift */
assert(uid_is_valid(arg_uid_shift));
return 0;
}
if (arg_userns_mode == USER_NAMESPACE_NO) {
arg_uid_shift = 0;
@ -3435,10 +3476,9 @@ static int inner_child(
if (!arg_network_namespace_path && arg_private_network) {
_cleanup_close_ int netns_fd = -EBADF;
if (arg_privileged) {
if (arg_userns_mode != USER_NAMESPACE_MANAGED)
if (unshare(CLONE_NEWNET) < 0)
return log_error_errno(errno, "Failed to unshare network namespace: %m");
}
netns_fd = namespace_open_by_type(NAMESPACE_NET);
if (netns_fd < 0)
@ -3452,8 +3492,10 @@ static int inner_child(
(void) barrier_place(barrier); /* #3 */
}
if (arg_privileged) {
r = mount_sysfs(NULL, arg_mount_settings);
if (arg_userns_mode != USER_NAMESPACE_MANAGED) {
log_notice("BEFORE");
r = mount_sysfs(NULL, arg_mount_settings | MOUNT_IN_USERNS);
log_notice("AFTER");
if (r < 0)
return r;
}
@ -3697,7 +3739,7 @@ static int inner_child(
return log_error_errno(errno, "Failed to acquire controlling TTY: %m");
}
log_debug("Inner child completed, invoking payload.");
log_debug("Inner child finished, invoking payload.");
/* Now, explicitly close the log, so that we then can close all remaining fds. Closing the log explicitly first
* has the benefit that the logging subsystem knows about it, and is thus ready to be reopened should we need
@ -3806,7 +3848,7 @@ static int setup_unix_export_dir_outside(char **ret) {
assert(ret);
if (!arg_privileged) {
if (arg_userns_mode == USER_NAMESPACE_MANAGED) {
log_debug("Not digging socket tunnel, because running unprivileged.");
return 0;
}
@ -3863,7 +3905,7 @@ static int setup_unix_export_host_inside(const char *directory, const char *unix
assert(directory);
if (!arg_privileged)
if (arg_userns_mode == USER_NAMESPACE_MANAGED)
return 0;
assert(unix_export_path);
@ -3917,12 +3959,15 @@ static DissectImageFlags determine_dissect_image_flags(void) {
DISSECT_IMAGE_PIN_PARTITION_DEVICES |
(arg_read_only ? DISSECT_IMAGE_READ_ONLY : DISSECT_IMAGE_FSCK|DISSECT_IMAGE_GROWFS) |
DISSECT_IMAGE_ALLOW_USERSPACE_VERITY |
(arg_console_mode == CONSOLE_INTERACTIVE ? DISSECT_IMAGE_ALLOW_INTERACTIVE_AUTH : 0);
(arg_console_mode == CONSOLE_INTERACTIVE ? DISSECT_IMAGE_ALLOW_INTERACTIVE_AUTH : 0) |
((arg_userns_ownership == USER_NAMESPACE_OWNERSHIP_FOREIGN) ? DISSECT_IMAGE_FOREIGN_UID :
(arg_userns_ownership != USER_NAMESPACE_OWNERSHIP_AUTO) ? DISSECT_IMAGE_IDENTITY_UID : 0);
}
static int outer_child(
Barrier *barrier,
const char *directory,
int mount_fd,
DissectedImage *dissected_image,
int fd_outer_socket,
int fd_inner_socket,
@ -3932,7 +3977,6 @@ static int outer_child(
_cleanup_(bind_user_context_freep) BindUserContext *bind_user_context = NULL;
_cleanup_strv_free_ char **os_release_pairs = NULL;
_cleanup_close_ int fd = -EBADF, mntns_fd = -EBADF;
bool idmap = false, enable_fuse;
const char *p;
pid_t pid;
@ -3942,9 +3986,9 @@ static int outer_child(
/* This is the "outer" child process, i.e the one forked off by the container manager itself. Its
* namespace situation is:
*
* - CLONE_NEWNS : already has its own (created by clone() if arg_privileged, or unshare() if !arg_unprivileged)
* - CLONE_NEWUSER : if arg_privileged: still in the host's
* if !arg_privileged: already has its own (created by nsresource_allocate_userns()->setns(userns_fd))
* - CLONE_NEWUSER : if not in USER_NAMESPACE_MANAGED mode: still in the host's
* if USER_NAMESPACE_MANAGED mode: already has its own (created by nsresource_allocate_userns()->setns(userns_fd))
* - CLONE_NEWNS : already has its own (created by clone() if not USER_NAMESPACE_MANAGED, or unshare() otherwise)
* - CLONE_NEWPID : still in the host's
* - CLONE_NEWUTS : still in the host's
* - CLONE_NEWIPC : still in the host's
@ -3977,6 +4021,14 @@ static int outer_child(
if (r < 0)
return r;
if (mount_fd >= 0) {
if (move_mount(mount_fd, "", AT_FDCWD, directory, MOVE_MOUNT_F_EMPTY_PATH) < 0)
return log_error_errno(errno, "Failed to attach root directory: %m");
mount_fd = safe_close(mount_fd);
log_debug("Successfully attached root directory to '%s'.", directory);
}
if (dissected_image) {
/* If we are operating on a disk image, then mount its root directory now, but leave out the
* rest. We can read the UID shift from it if we need to. Further down we'll mount the rest,
@ -4000,7 +4052,21 @@ static int outer_child(
if (r < 0)
return r;
/* If we do userns on our own, we need to chown() all files ourselves before. Otherwise, if userns is
* off or we are in managed mode we already have the userns applied, hence don't need to chown
* anything */
uid_t chown_uid, chown_range;
if (in_child_chown()) {
chown_uid = arg_uid_shift;
chown_range = arg_uid_range;
} else {
chown_uid = 0;
chown_range = UINT32_C(0x10000);
}
if (arg_userns_mode != USER_NAMESPACE_NO) {
_cleanup_close_ int mntns_fd = -EBADF;
r = namespace_open(0,
/* ret_pidns_fd = */ NULL,
&mntns_fd,
@ -4034,28 +4100,15 @@ static int outer_child(
if (l != sizeof(arg_uid_shift))
return log_error_errno(SYNTHETIC_ERRNO(EIO),
"Short read while receiving UID shift.");
if (in_child_chown())
chown_uid = arg_uid_shift;
}
log_full(arg_quiet ? LOG_DEBUG : LOG_INFO,
"Selected user namespace base " UID_FMT " and range " UID_FMT ".", arg_uid_shift, arg_uid_range);
}
if (path_equal(directory, "/")) {
/* If the directory we shall boot is the host, let's operate on a bind mount at a different
* place, so that we can make changes to its mount structure (for example, to implement
* --volatile=) without this interfering with our ability to access files such as
* /etc/localtime to copy into the container. Note that we use a fixed place for this
* (instead of a temporary directory, since we are living in our own mount namespace here
* already, and thus don't need to be afraid of colliding with anyone else's mounts). */
(void) mkdir_p("/run/systemd/nspawn-root", 0755);
r = mount_nofollow_verbose(LOG_ERR, "/", "/run/systemd/nspawn-root", NULL, MS_BIND|MS_REC, NULL);
if (r < 0)
return r;
directory = "/run/systemd/nspawn-root";
}
/* Make sure we always have a mount that we can move to root later on. */
r = make_mount_point(directory);
if (r < 0)
@ -4079,7 +4132,7 @@ static int outer_child(
r = setup_volatile_mode(
directory,
arg_volatile_mode,
arg_uid_shift,
chown_uid,
arg_selinux_apifs_context);
if (r < 0)
return r;
@ -4087,8 +4140,8 @@ static int outer_child(
r = bind_user_prepare(
directory,
arg_bind_user,
arg_uid_shift,
arg_uid_range,
chown_uid,
chown_range,
&arg_custom_mounts, &arg_n_custom_mounts,
&bind_user_context);
if (r < 0)
@ -4119,17 +4172,47 @@ static int outer_child(
directory,
arg_custom_mounts,
arg_n_custom_mounts,
arg_uid_shift,
arg_uid_range,
chown_uid,
chown_range,
arg_selinux_apifs_context,
MOUNT_ROOT_ONLY);
if (r < 0)
return r;
if (arg_userns_mode != USER_NAMESPACE_NO &&
IN_SET(arg_userns_ownership, USER_NAMESPACE_OWNERSHIP_MAP, USER_NAMESPACE_OWNERSHIP_AUTO) &&
arg_uid_shift != 0) {
if (!IN_SET(arg_userns_mode, USER_NAMESPACE_NO, USER_NAMESPACE_MANAGED) &&
IN_SET(arg_userns_ownership, USER_NAMESPACE_OWNERSHIP_MAP, USER_NAMESPACE_OWNERSHIP_FOREIGN, USER_NAMESPACE_OWNERSHIP_AUTO) &&
chown_uid != 0) {
_cleanup_strv_free_ char **dirs = NULL;
RemountIdmapping mapping;
switch (arg_userns_ownership) {
case USER_NAMESPACE_OWNERSHIP_MAP:
mapping = REMOUNT_IDMAPPING_HOST_ROOT;
break;
case USER_NAMESPACE_OWNERSHIP_FOREIGN:
mapping = REMOUNT_IDMAPPING_FOREIGN_WITH_HOST_ROOT;
break;
case USER_NAMESPACE_OWNERSHIP_AUTO: {
struct stat st;
if (lstat(directory, &st) < 0)
return log_error_errno(errno, "Failed to stat() container root directory '%s': %m", directory);
r = stat_verify_directory(&st);
if (r < 0)
return log_error_errno(r, "Container root directory '%s' is not a directory: %m", directory);
mapping = uid_is_foreign(st.st_uid) ?
REMOUNT_IDMAPPING_FOREIGN_WITH_HOST_ROOT :
REMOUNT_IDMAPPING_HOST_ROOT;
break;
}
default:
assert_not_reached();
}
if (arg_volatile_mode != VOLATILE_YES) {
r = strv_extend(&dirs, directory);
@ -4148,7 +4231,13 @@ static int outer_child(
return log_oom();
}
r = remount_idmap(dirs, arg_uid_shift, arg_uid_range, UID_INVALID, UID_INVALID, REMOUNT_IDMAPPING_HOST_ROOT);
r = remount_idmap(
dirs,
chown_uid,
chown_range,
/* host_owner= */ UID_INVALID,
/* dest_owner= */ UID_INVALID,
mapping);
if (r == -EINVAL || ERRNO_IS_NEG_NOT_SUPPORTED(r)) {
/* This might fail because the kernel or file system doesn't support idmapping. We
* can't really distinguish this nicely, nor do we have any guarantees about the
@ -4170,7 +4259,7 @@ static int outer_child(
r = setup_volatile_mode_after_remount_idmap(
directory,
arg_volatile_mode,
arg_uid_shift,
chown_uid,
arg_selinux_apifs_context);
if (r < 0)
return r;
@ -4180,8 +4269,8 @@ static int outer_child(
r = dissected_image_mount_and_warn(
dissected_image,
directory,
arg_uid_shift,
arg_uid_range,
chown_uid,
chown_range,
/* userns_fd= */ -EBADF,
determine_dissect_image_flags()|
DISSECT_IMAGE_MOUNT_NON_ROOT_ONLY|
@ -4205,11 +4294,14 @@ static int outer_child(
"Short write while sending cgroup mode.");
}
r = recursive_chown(directory, arg_uid_shift, arg_uid_range);
r = recursive_chown(directory, chown_uid, chown_range);
if (r < 0)
return r;
r = base_filesystem_create(directory, arg_uid_shift, (gid_t) arg_uid_shift);
r = base_filesystem_create(
directory,
chown_uid,
(gid_t) chown_uid);
if (r < 0)
return r;
@ -4222,7 +4314,7 @@ static int outer_child(
r = mount_all(directory,
arg_mount_settings,
arg_uid_shift,
chown_uid,
arg_selinux_apifs_context);
if (r < 0)
return r;
@ -4240,16 +4332,16 @@ static int outer_child(
if (r < 0)
return r;
(void) dev_setup(directory, arg_uid_shift, arg_uid_shift);
(void) dev_setup(directory, chown_uid, chown_uid);
p = prefix_roota(directory, "/run/host");
(void) make_inaccessible_nodes(p, arg_uid_shift, arg_uid_shift);
(void) make_inaccessible_nodes(p, chown_uid, chown_uid);
r = setup_unix_export_host_inside(directory, unix_export_path);
if (r < 0)
return r;
r = setup_pts(directory);
r = setup_pts(directory, chown_uid);
if (r < 0)
return r;
@ -4273,8 +4365,8 @@ static int outer_child(
directory,
arg_custom_mounts,
arg_n_custom_mounts,
arg_uid_shift,
arg_uid_range,
chown_uid,
chown_range,
arg_selinux_apifs_context,
MOUNT_NON_ROOT_ONLY);
if (r < 0)
@ -4309,8 +4401,8 @@ static int outer_child(
directory,
arg_unified_cgroup_hierarchy,
arg_userns_mode != USER_NAMESPACE_NO,
arg_uid_shift,
arg_uid_range,
chown_uid,
chown_range,
arg_selinux_apifs_context,
false);
if (r < 0)
@ -4325,7 +4417,8 @@ static int outer_child(
* visible. Hence there we do it the other way round: we first allocate a new set of namespaces
* (and fork for it) for which we then mount sysfs/procfs, and only then switch root. */
if (arg_privileged) {
_cleanup_close_ int notify_fd = -EBADF;
if (arg_userns_mode != USER_NAMESPACE_MANAGED) {
/* Mark everything as shared so our mounts get propagated down. This is required to make new
* bind mounts available in systemd services inside the container that create a new mount
* namespace. See https://github.com/systemd/systemd/issues/3860 Further submounts (such as
@ -4355,21 +4448,21 @@ static int outer_child(
* Note, the inner child wouldn't be able to unmount the instances on its own since
* it doesn't own the originating mount namespace. IOW, the outer child needs to do
* this. */
r = pin_fully_visible_fs();
r = pin_fully_visible_api_fs();
if (r < 0)
return r;
}
fd = setup_notify_child(NULL);
notify_fd = setup_notify_child(NULL);
} else
fd = setup_notify_child(directory);
if (fd < 0)
return fd;
notify_fd = setup_notify_child(directory);
if (notify_fd < 0)
return notify_fd;
pid = raw_clone(SIGCHLD|CLONE_NEWNS|
arg_clone_ns_flags |
(arg_userns_mode != USER_NAMESPACE_NO ? CLONE_NEWUSER : 0) |
((arg_private_network && !arg_privileged) ? CLONE_NEWNET : 0));
(IN_SET(arg_userns_mode, USER_NAMESPACE_FIXED, USER_NAMESPACE_PICK) ? CLONE_NEWUSER : 0) |
((arg_private_network && arg_userns_mode == USER_NAMESPACE_MANAGED) ? CLONE_NEWNET : 0));
if (pid < 0)
return log_error_errno(errno, "Failed to fork inner child: %m");
if (pid == 0) {
@ -4388,7 +4481,7 @@ static int outer_child(
return log_error_errno(r, "Failed to join network namespace: %m");
}
if (!arg_privileged) {
if (arg_userns_mode == USER_NAMESPACE_MANAGED) {
/* In unprivileged operation, sysfs + procfs are special, we'll have to mount them
* inside the inner namespaces, but before we switch root. Hence do so here. */
_cleanup_free_ char *j = path_join(directory, "/proc");
@ -4429,7 +4522,7 @@ static int outer_child(
return log_error_errno(SYNTHETIC_ERRNO(EIO),
"Short write while sending machine ID.");
l = send_one_fd(fd_outer_socket, fd, 0);
l = send_one_fd(fd_outer_socket, notify_fd, 0);
if (l < 0)
return log_error_errno(l, "Failed to send notify fd: %m");
@ -5074,7 +5167,7 @@ static int load_settings(void) {
return 0;
/* We first look in the admin's directories in /etc and /run */
if (arg_privileged) {
if (arg_privileged)
FOREACH_STRING(i, "/etc/systemd/nspawn", "/run/systemd/nspawn") {
_cleanup_free_ char *j = NULL;
@ -5096,7 +5189,6 @@ static int load_settings(void) {
if (errno != ENOENT)
return log_error_errno(errno, "Failed to open %s: %m", j);
}
}
if (!f) {
/* After that, let's look for a file next to the
@ -5154,6 +5246,8 @@ static int load_oci_bundle(void) {
}
static int run_container(
const char *directory,
int mount_fd,
DissectedImage *dissected_image,
int userns_fd,
FDSet *fds,
@ -5248,9 +5342,8 @@ static int run_container(
"Path %s doesn't refer to a network namespace, refusing.", arg_network_namespace_path);
}
if (arg_privileged) {
if (arg_userns_mode != USER_NAMESPACE_MANAGED) {
assert(userns_fd < 0);
/* If we have no user namespace then we'll clone and create a new mount namespace right-away. */
*pid = raw_clone(SIGCHLD|CLONE_NEWNS);
@ -5260,7 +5353,6 @@ static int run_container(
", do you have namespace support enabled in your kernel? (You need UTS, IPC, PID and NET namespacing built in)" : "");
} else {
assert(userns_fd >= 0);
/* If we have a user namespace then we'll clone() first, and then join the user namespace,
* and then open the mount namespace, so that it is owned by the user namespace */
@ -5298,7 +5390,8 @@ static int run_container(
(void) reset_signal_mask();
r = outer_child(&barrier,
arg_directory,
directory,
mount_fd,
dissected_image,
fd_outer_socket_pair[1],
fd_inner_socket_pair[1],
@ -5416,9 +5509,11 @@ static int run_container(
if (!barrier_place_and_sync(&barrier)) /* #1 */
return log_error_errno(SYNTHETIC_ERRNO(ESRCH), "Child died too early.");
if (arg_userns_mode != USER_NAMESPACE_MANAGED) {
r = setup_uid_map(*pid, bind_user_uid, n_bind_user_uid);
if (r < 0)
return r;
}
(void) barrier_place(&barrier); /* #2 */
}
@ -5442,7 +5537,7 @@ static int run_container(
return r;
if (arg_network_veth) {
if (arg_privileged) {
if (arg_userns_mode != USER_NAMESPACE_MANAGED) {
r = setup_veth(arg_machine, *pid, veth_name,
arg_network_bridge || arg_network_zone, &arg_network_provided_mac);
if (r < 0)
@ -5580,7 +5675,7 @@ static int run_container(
arg_unified_cgroup_hierarchy,
arg_uid_shift,
userns_fd,
arg_privileged);
arg_userns_mode);
if (r < 0)
return r;
@ -5622,8 +5717,8 @@ static int run_container(
if (!barrier_sync(&barrier)) /* #5.1 */
return log_error_errno(SYNTHETIC_ERRNO(ESRCH), "Child died too early.");
if (arg_userns_mode != USER_NAMESPACE_NO) {
r = wipe_fully_visible_fs(mntns_fd);
if (!IN_SET(arg_userns_mode, USER_NAMESPACE_NO, USER_NAMESPACE_MANAGED)) {
r = wipe_fully_visible_api_fs(mntns_fd);
if (r < 0)
return r;
mntns_fd = safe_close(mntns_fd);
@ -5749,7 +5844,7 @@ static int run_container(
fd_kmsg_fifo = safe_close(fd_kmsg_fifo);
if (arg_private_network && arg_privileged) {
if (arg_private_network && arg_userns_mode != USER_NAMESPACE_MANAGED) {
r = move_back_network_interfaces(child_netns_fd, arg_network_interfaces);
if (r < 0)
return r;
@ -5914,23 +6009,34 @@ static int cant_be_in_netns(void) {
return 0;
}
static void initialize_defaults(void) {
arg_privileged = getuid() == 0;
/* If running unprivileged default to systemd-nsresourced operation */
arg_userns_mode = arg_privileged ? USER_NAMESPACE_NO : USER_NAMESPACE_MANAGED;
/* Imply private networking for unprivileged operation, since kernel otherwise refuses mounting sysfs */
arg_private_network = !arg_privileged;
}
static int run(int argc, char *argv[]) {
bool remove_directory = false, remove_image = false, veth_created = false, remove_tmprootdir = false;
_cleanup_close_ int master = -EBADF, userns_fd = -EBADF;
bool remove_directory = false, remove_image = false, veth_created = false;
_cleanup_close_ int master = -EBADF, userns_fd = -EBADF, mount_fd = -EBADF;
_cleanup_fdset_free_ FDSet *fds = NULL;
int r, n_fd_passed, ret = EXIT_SUCCESS;
char veth_name[IFNAMSIZ] = "";
struct ExposeArgs expose_args = {};
_cleanup_(release_lock_file) LockFile tree_global_lock = LOCK_FILE_INIT, tree_local_lock = LOCK_FILE_INIT;
char tmprootdir[] = "/tmp/nspawn-root-XXXXXX";
_cleanup_(rmdir_and_freep) char *tmprootdir = NULL;
_cleanup_(loop_device_unrefp) LoopDevice *loop = NULL;
_cleanup_(dissected_image_unrefp) DissectedImage *dissected_image = NULL;
_cleanup_(fw_ctx_freep) FirewallContext *fw_ctx = NULL;
const char *rootdir = NULL;
pid_t pid = 0;
log_setup();
arg_privileged = getuid() == 0;
initialize_defaults();
r = parse_argv(argc, argv);
if (r <= 0)
@ -5987,7 +6093,7 @@ static int run(int argc, char *argv[]) {
/* Reapply environment settings. */
(void) detect_unified_cgroup_hierarchy_from_environment();
if (!arg_privileged) {
if (arg_userns_mode == USER_NAMESPACE_MANAGED) {
r = cg_all_unified();
if (r < 0) {
log_error_errno(r, "Failed to determine if we are in unified cgroupv2 mode: %m");
@ -6023,14 +6129,33 @@ static int run(int argc, char *argv[]) {
if (arg_console_mode == CONSOLE_PIPE) /* if we pass STDERR on to the container, don't add our own logs into it too */
arg_quiet = true;
if (arg_directory) {
assert(!arg_image);
if (arg_userns_mode == USER_NAMESPACE_MANAGED) {
/* Let's allocate a 64K userns first, if managed mode is chosen */
if (!arg_privileged) {
r = log_error_errno(SYNTHETIC_ERRNO(EOPNOTSUPP), "Invoking container from plain directory tree is currently not supported if called without privileges.");
_cleanup_free_ char *userns_name = strjoin("nspawn-", arg_machine);
if (!userns_name) {
r = log_oom();
goto finish;
}
userns_fd = nsresource_allocate_userns(userns_name, UINT64_C(0x10000));
if (userns_fd < 0) {
r = log_error_errno(userns_fd, "Failed to allocate user namespace with 64K users: %m");
goto finish;
}
r = userns_get_base_uid(userns_fd, &arg_uid_shift, /* ret_gid= */ NULL);
if (r < 0) {
log_error_errno(r, "Failed to determine UID shift from userns: %m");
goto finish;
}
arg_uid_range = UINT32_C(0x10000);
}
if (arg_directory) {
assert(!arg_image);
/* Safety precaution: let's not allow running images from the live host OS image, as long as
* /var from the host will propagate into container dynamically (because bad things happen if
* two systems write to the same /var). Let's allow it for the special cases where /var is
@ -6200,6 +6325,15 @@ static int run(int argc, char *argv[]) {
}
}
if (arg_userns_mode == USER_NAMESPACE_MANAGED) {
r = mountfsd_mount_directory(
arg_directory,
userns_fd,
determine_dissect_image_flags(),
&mount_fd);
if (r < 0)
goto finish;
}
} else {
DissectImageFlags dissect_image_flags =
determine_dissect_image_flags();
@ -6274,20 +6408,7 @@ static int run(int argc, char *argv[]) {
dissect_image_flags |= DISSECT_IMAGE_NO_PARTITION_TABLE;
}
if (!mkdtemp(tmprootdir)) {
r = log_error_errno(errno, "Failed to create temporary directory: %m");
goto finish;
}
remove_tmprootdir = true;
arg_directory = strdup(tmprootdir);
if (!arg_directory) {
r = log_oom();
goto finish;
}
if (arg_privileged) {
if (arg_userns_mode != USER_NAMESPACE_MANAGED) {
r = loop_device_make_by_path(
arg_image,
arg_read_only ? O_RDONLY : O_RDWR,
@ -6339,19 +6460,6 @@ static int run(int argc, char *argv[]) {
if (r < 0)
goto finish;
} else {
_cleanup_free_ char *userns_name = strjoin("nspawn-", arg_machine);
if (!userns_name) {
r = log_oom();
goto finish;
}
/* if we are unprivileged, let's allocate a 64K userns first */
userns_fd = nsresource_allocate_userns(userns_name, UINT64_C(0x10000));
if (userns_fd < 0) {
r = log_error_errno(userns_fd, "Failed to allocate user namespace with 64K users: %m");
goto finish;
}
r = mountfsd_mount_image(
arg_image,
userns_fd,
@ -6370,7 +6478,22 @@ static int run(int argc, char *argv[]) {
arg_architecture = dissected_image_architecture(dissected_image);
}
r = custom_mount_prepare_all(arg_directory, arg_custom_mounts, arg_n_custom_mounts);
if (arg_directory && !path_is_root(arg_directory) && arg_userns_mode != USER_NAMESPACE_MANAGED)
/* If we are privileged we can operate directly on the supplied root directory, unless it is
* the host's own root directory. */
rootdir = arg_directory;
else {
/* Otherwise create a tempory directory we operate on */
r = mkdtemp_malloc("/tmp/nspawn-root-XXXXXX", &tmprootdir);
if (r < 0) {
log_error_errno(r, "Failed to create temporary directory: %m");
goto finish;
}
rootdir = tmprootdir;
}
r = custom_mount_prepare_all(rootdir, arg_custom_mounts, arg_n_custom_mounts);
if (r < 0)
goto finish;
@ -6405,6 +6528,8 @@ static int run(int argc, char *argv[]) {
}
for (;;) {
r = run_container(
rootdir,
mount_fd,
dissected_image,
userns_fd,
fds,
@ -6447,12 +6572,7 @@ finish:
log_warning_errno(errno, "Can't remove image file '%s', ignoring: %m", arg_image);
}
if (remove_tmprootdir) {
if (rmdir(tmprootdir) < 0)
log_debug_errno(errno, "Can't remove temporary root directory '%s', ignoring: %m", tmprootdir);
}
if (arg_machine && arg_privileged) {
if (arg_machine && arg_userns_mode != USER_NAMESPACE_MANAGED) {
const char *p;
p = strjoina("/run/systemd/nspawn/propagate/", arg_machine);
@ -6466,7 +6586,7 @@ finish:
expose_port_flush(&fw_ctx, arg_expose_ports, AF_INET, &expose_args.address4);
expose_port_flush(&fw_ctx, arg_expose_ports, AF_INET6, &expose_args.address6);
if (arg_privileged) {
if (arg_userns_mode != USER_NAMESPACE_MANAGED) {
if (veth_created)
(void) remove_veth_links(veth_name, arg_network_veth_extra);
(void) remove_bridge(arg_network_zone);

View File

@ -3,7 +3,7 @@
#include <stdlib.h>
#include "log.h"
#include "nspawn-patch-uid.h"
#include "shift-uid.h"
#include "user-util.h"
#include "string-util.h"
#include "tests.h"

View File

@ -7,6 +7,7 @@
#include <sys/eventfd.h>
#include <sys/stat.h>
#include <sys/wait.h>
#include <utmpx.h>
#include "sd-daemon.h"
#include "sd-netlink.h"
@ -608,8 +609,28 @@ static int test_userns_api_support(sd_varlink *link) {
return 0;
}
static int validate_name(sd_varlink *link, const char *name, char **ret) {
_cleanup_free_ char *un = NULL;
static char *random_name(void) {
char *s = NULL;
if (asprintf(&s, "r%016" PRIx64, random_u64()) < 0)
return NULL;
return s;
}
static char *shorten_name(const char *name) {
char *n = strdup(name);
if (!n)
return NULL;
/* make sure it fits into utmpx even if prefixed with "ns-" and suffixed by "-65535" */
strshorten(n, sizeof_field(struct utmpx, ut_user) - 3 - 6);
return n;
}
static int validate_name(sd_varlink *link, const char *name, bool mangle, char **ret) {
int r;
assert(link);
@ -621,13 +642,25 @@ static int validate_name(sd_varlink *link, const char *name, char **ret) {
if (r < 0)
return r;
_cleanup_free_ char *un = NULL;
if (peer_uid == 0) {
if (!userns_name_is_valid(name))
return sd_varlink_error_invalid_parameter_name(link, "name");
if (userns_name_is_valid(name)) {
un = strdup(name);
if (!un)
return -ENOMEM;
} else if (mangle) {
un = shorten_name(name);
if (!un)
return -ENOMEM;
if (!userns_name_is_valid(un)) {
free(un);
un = random_name();
if (!un)
return -ENOMEM;
}
}
} else {
/* The the client is not root then prefix the name with the UID of the peer, so that they
* live in separate namespaces and cannot steal each other's names. */
@ -635,9 +668,25 @@ static int validate_name(sd_varlink *link, const char *name, char **ret) {
if (asprintf(&un, UID_FMT "-%s", peer_uid, name) < 0)
return -ENOMEM;
if (!userns_name_is_valid(un) && mangle) {
_cleanup_free_ char *c = shorten_name(un);
if (!c)
return -ENOMEM;
if (userns_name_is_valid(c))
free_and_replace(un, c);
else {
_cleanup_free_ char *rnd = random_name();
un = mfree(un);
if (asprintf(&un, UID_FMT "-%s", peer_uid, rnd) < 0)
return -ENOMEM;
}
}
}
if (!userns_name_is_valid(un))
return sd_varlink_error_invalid_parameter_name(link, "name");
}
*ret = TAKE_PTR(un);
return 0;
@ -727,6 +776,7 @@ typedef struct AllocateParameters {
unsigned size;
unsigned target;
unsigned userns_fd_idx;
bool mangle_name;
} AllocateParameters;
static int vl_method_allocate_user_range(sd_varlink *link, sd_json_variant *parameters, sd_varlink_method_flags_t flags, void *userdata) {
@ -736,6 +786,7 @@ static int vl_method_allocate_user_range(sd_varlink *link, sd_json_variant *para
{ "size", _SD_JSON_VARIANT_TYPE_INVALID, sd_json_dispatch_uint, offsetof(AllocateParameters, size), SD_JSON_MANDATORY },
{ "target", _SD_JSON_VARIANT_TYPE_INVALID, sd_json_dispatch_uint, offsetof(AllocateParameters, target), 0 },
{ "userNamespaceFileDescriptor", _SD_JSON_VARIANT_TYPE_INVALID, sd_json_dispatch_uint, offsetof(AllocateParameters, userns_fd_idx), SD_JSON_MANDATORY },
{ "mangleName", SD_JSON_VARIANT_BOOLEAN, sd_json_dispatch_stdbool, offsetof(AllocateParameters, mangle_name), 0 },
{}
};
@ -761,7 +812,7 @@ static int vl_method_allocate_user_range(sd_varlink *link, sd_json_variant *para
if (r != 0)
return r;
r = validate_name(link, p.name, &userns_name);
r = validate_name(link, p.name, p.mangle_name, &userns_name);
if (r != 0)
return r;
@ -928,6 +979,7 @@ static int validate_userns_is_safe(sd_varlink *link, int userns_fd) {
typedef struct RegisterParameters {
const char *name;
unsigned userns_fd_idx;
bool mangle_name;
} RegisterParameters;
static int vl_method_register_user_namespace(sd_varlink *link, sd_json_variant *parameters, sd_varlink_method_flags_t flags, void *userdata) {
@ -935,6 +987,7 @@ static int vl_method_register_user_namespace(sd_varlink *link, sd_json_variant *
static const sd_json_dispatch_field dispatch_table[] = {
{ "name", SD_JSON_VARIANT_STRING, sd_json_dispatch_const_string, offsetof(RegisterParameters, name), SD_JSON_MANDATORY },
{ "userNamespaceFileDescriptor", _SD_JSON_VARIANT_TYPE_INVALID, sd_json_dispatch_uint, offsetof(RegisterParameters, userns_fd_idx), SD_JSON_MANDATORY },
{ "mangleName", SD_JSON_VARIANT_BOOLEAN, sd_json_dispatch_stdbool, offsetof(AllocateParameters, mangle_name), 0 },
{}
};
@ -959,7 +1012,7 @@ static int vl_method_register_user_namespace(sd_varlink *link, sd_json_variant *
if (r != 0)
return r;
r = validate_name(link, p.name, &userns_name);
r = validate_name(link, p.name, p.mangle_name, &userns_name);
if (r != 0)
return r;

View File

@ -592,6 +592,9 @@ bool userns_name_is_valid(const char *name) {
/* Checks if the specified string is suitable as user namespace name. */
if (isempty(name))
return false;
if (strlen(name) > NAME_MAX) /* before we use alloca(), let's check for size */
return false;

View File

@ -615,7 +615,7 @@ enum nss_status _nss_systemd_setpwent(int stayopen) {
* (think: LDAP/NIS type situations), and our synthesizing of root/nobody is a robustness fallback
* only, which matters for getpwnam()/getpwuid() primarily, which are the main NSS entrypoints to the
* user database. */
r = userdb_all(nss_glue_userdb_flags() | USERDB_DONT_SYNTHESIZE, &getpwent_data.iterator);
r = userdb_all(nss_glue_userdb_flags() | USERDB_DONT_SYNTHESIZE_INTRINSIC | USERDB_DONT_SYNTHESIZE_FOREIGN, &getpwent_data.iterator);
return r < 0 ? NSS_STATUS_UNAVAIL : NSS_STATUS_SUCCESS;
}
@ -634,8 +634,8 @@ enum nss_status _nss_systemd_setgrent(int stayopen) {
getgrent_data.iterator = userdb_iterator_free(getgrent_data.iterator);
getgrent_data.by_membership = false;
/* See _nss_systemd_setpwent() for an explanation why we use USERDB_DONT_SYNTHESIZE here */
r = groupdb_all(nss_glue_userdb_flags() | USERDB_DONT_SYNTHESIZE, &getgrent_data.iterator);
/* See _nss_systemd_setpwent() for an explanation why we use USERDB_DONT_SYNTHESIZE_INTRINSIC here */
r = groupdb_all(nss_glue_userdb_flags() | USERDB_DONT_SYNTHESIZE_INTRINSIC | USERDB_DONT_SYNTHESIZE_FOREIGN, &getgrent_data.iterator);
return r < 0 ? NSS_STATUS_UNAVAIL : NSS_STATUS_SUCCESS;
}
@ -654,8 +654,8 @@ enum nss_status _nss_systemd_setspent(int stayopen) {
getspent_data.iterator = userdb_iterator_free(getspent_data.iterator);
getspent_data.by_membership = false;
/* See _nss_systemd_setpwent() for an explanation why we use USERDB_DONT_SYNTHESIZE here */
r = userdb_all(nss_glue_userdb_flags() | USERDB_DONT_SYNTHESIZE, &getspent_data.iterator);
/* See _nss_systemd_setpwent() for an explanation why we use USERDB_DONT_SYNTHESIZE_INTRINSIC here */
r = userdb_all(nss_glue_userdb_flags() | USERDB_DONT_SYNTHESIZE_INTRINSIC | USERDB_DONT_SYNTHESIZE_FOREIGN, &getspent_data.iterator);
return r < 0 ? NSS_STATUS_UNAVAIL : NSS_STATUS_SUCCESS;
}
@ -675,7 +675,7 @@ enum nss_status _nss_systemd_setsgent(int stayopen) {
getsgent_data.by_membership = false;
/* See _nss_systemd_setpwent() for an explanation why we use USERDB_DONT_SYNTHESIZE here */
r = groupdb_all(nss_glue_userdb_flags() | USERDB_DONT_SYNTHESIZE, &getsgent_data.iterator);
r = groupdb_all(nss_glue_userdb_flags() | USERDB_DONT_SYNTHESIZE_INTRINSIC | USERDB_DONT_SYNTHESIZE_FOREIGN, &getsgent_data.iterator);
return r < 0 ? NSS_STATUS_UNAVAIL : NSS_STATUS_SUCCESS;
}

View File

@ -14,6 +14,7 @@
#include "main-func.h"
#include "pager.h"
#include "pretty-print.h"
#include "sort-util.h"
#include "string-util.h"
static const char *arg_suffix = NULL;
@ -101,27 +102,50 @@ static const char* const path_table[_SD_PATH_MAX] = {
[SD_PATH_SYSTEMD_USER_ENVIRONMENT_GENERATOR] = "systemd-user-environment-generator",
[SD_PATH_SYSTEMD_SEARCH_SYSTEM_ENVIRONMENT_GENERATOR] = "systemd-search-system-environment-generator",
[SD_PATH_SYSTEMD_SEARCH_USER_ENVIRONMENT_GENERATOR] = "systemd-search-user-environment-generator",
[SD_PATH_SYSTEM_CREDENTIAL_STORE] = "system-credential-store",
[SD_PATH_SYSTEM_SEARCH_CREDENTIAL_STORE] = "system-search-credential-store",
[SD_PATH_SYSTEM_CREDENTIAL_STORE_ENCRYPTED] = "system-credential-store-encrypted",
[SD_PATH_SYSTEM_SEARCH_CREDENTIAL_STORE_ENCRYPTED] = "system-search-credential-store-encrypted",
[SD_PATH_USER_CREDENTIAL_STORE] = "user-credential-store",
[SD_PATH_USER_SEARCH_CREDENTIAL_STORE] = "user-search-credential-store",
[SD_PATH_USER_CREDENTIAL_STORE_ENCRYPTED] = "user-credential-store-encrypted",
[SD_PATH_USER_SEARCH_CREDENTIAL_STORE_ENCRYPTED] = "user-search-credential-store-encrypted",
};
static int order_cmp(const size_t *a, const size_t *b) {
assert(*a < ELEMENTSOF(path_table));
assert(*b < ELEMENTSOF(path_table));
return strcmp(path_table[*a], path_table[*b]);
}
static int list_paths(void) {
int r = 0;
int ret = 0, r;
pager_open(arg_pager_flags);
for (size_t i = 0; i < ELEMENTSOF(path_table); i++) {
_cleanup_free_ char *p = NULL;
int q;
size_t order[ELEMENTSOF(path_table)];
q = sd_path_lookup(i, arg_suffix, &p);
if (q < 0) {
log_full_errno(q == -ENXIO ? LOG_DEBUG : LOG_ERR,
q, "Failed to query %s: %m", path_table[i]);
if (q != -ENXIO)
RET_GATHER(r, q);
for (size_t i = 0; i < ELEMENTSOF(order); i++)
order[i] = i;
typesafe_qsort(order, ELEMENTSOF(order), order_cmp);
for (size_t i = 0; i < ELEMENTSOF(order); i++) {
size_t j = order[i];
const char *t = ASSERT_PTR(path_table[j]);
_cleanup_free_ char *p = NULL;
r = sd_path_lookup(j, arg_suffix, &p);
if (r < 0) {
log_full_errno(r == -ENXIO ? LOG_DEBUG : LOG_ERR, r, "Failed to query %s, proceeding: %m", t);
if (r != -ENXIO)
RET_GATHER(ret, r);
continue;
}
printf("%s%s:%s %s\n", ansi_highlight(), path_table[i], ansi_normal(), p);
printf("%s%s:%s %s\n", ansi_highlight(), t, ansi_normal(), p);
}
return r;
@ -154,14 +178,16 @@ static int help(void) {
if (r < 0)
return log_oom();
printf("%s [OPTIONS...] [NAME...]\n\n"
"Show system and user paths.\n\n"
printf("%s [OPTIONS...] [NAME...]\n"
"\n%sShow system and user paths.%s\n\n"
" -h --help Show this help\n"
" --version Show package version\n"
" --suffix=SUFFIX Suffix to append to paths\n"
" --no-pager Do not pipe output into a pager\n"
"\nSee the %s for details.\n",
program_invocation_short_name,
ansi_highlight(),
ansi_normal(),
link);
return 0;
@ -224,10 +250,11 @@ static int run(int argc, char* argv[]) {
if (r <= 0)
return r;
if (argc > optind)
if (argc > optind) {
r = 0;
for (int i = optind; i < argc; i++)
RET_GATHER(r, print_path(argv[i]));
else
} else
r = list_paths();
return r;

View File

@ -173,6 +173,7 @@ DEFINE_PRIVATE_HASH_OPS_WITH_VALUE_DESTRUCTOR(portable_metadata_hash_ops, char,
PortableMetadata, portable_metadata_unref);
static int extract_now(
RuntimeScope scope,
const char *where,
char **matches,
const char *image_name,
@ -199,6 +200,7 @@ static int extract_now(
* parent. To handle both cases in one call this function also gets a 'socket_fd' parameter, which when >= 0 is
* used to send the data to the parent. */
assert(scope < _RUNTIME_SCOPE_MAX);
assert(where);
/* First, find os-release/extension-release and send it upstream (or just save it). */
@ -248,7 +250,7 @@ static int extract_now(
/* Then, send unit file data to the parent (or/and add it to the hashmap). For that we use our usual unit
* discovery logic. Note that we force looking inside of /lib/systemd/system/ for units too, as the
* image might have a legacy split-usr layout. */
r = lookup_paths_init(&paths, RUNTIME_SCOPE_SYSTEM, LOOKUP_PATHS_SPLIT_USR, where);
r = lookup_paths_init(&paths, scope, LOOKUP_PATHS_SPLIT_USR, where);
if (r < 0)
return log_debug_errno(r, "Failed to acquire lookup paths: %m");
@ -348,6 +350,7 @@ static int extract_now(
}
static int portable_extract_by_path(
RuntimeScope scope,
const char *path,
bool path_is_extension,
bool relax_extension_release_check,
@ -381,7 +384,7 @@ static int portable_extract_by_path(
if (r < 0)
return log_error_errno(r, "Failed to extract image name from path '%s': %m", path);
r = extract_now(path, matches, image_name, path_is_extension, /* relax_extension_release_check= */ false, -1, &os_release, &unit_files);
r = extract_now(scope, path, matches, image_name, path_is_extension, /* relax_extension_release_check= */ false, -1, &os_release, &unit_files);
if (r < 0)
return r;
@ -458,7 +461,7 @@ static int portable_extract_by_path(
goto child_finish;
}
r = extract_now(tmpdir, matches, m->image_name, path_is_extension, relax_extension_release_check, seq[1], NULL, NULL);
r = extract_now(scope, tmpdir, matches, m->image_name, path_is_extension, relax_extension_release_check, seq[1], NULL, NULL);
child_finish:
_exit(r < 0 ? EXIT_FAILURE : EXIT_SUCCESS);
@ -549,6 +552,7 @@ static int portable_extract_by_path(
}
static int extract_image_and_extensions(
RuntimeScope scope,
const char *name_or_path,
char **matches,
char **extension_image_paths,
@ -595,7 +599,7 @@ static int extract_image_and_extensions(
name_or_path = result.path;
}
r = image_find_harder(IMAGE_PORTABLE, name_or_path, /* root= */ NULL, &image);
r = image_find_harder(scope, IMAGE_PORTABLE, name_or_path, /* root= */ NULL, &image);
if (r < 0)
return r;
@ -633,7 +637,7 @@ static int extract_image_and_extensions(
path = ext_result.path;
}
r = image_find_harder(IMAGE_PORTABLE, path, NULL, &new);
r = image_find_harder(scope, IMAGE_PORTABLE, path, NULL, &new);
if (r < 0)
return r;
@ -645,6 +649,7 @@ static int extract_image_and_extensions(
}
r = portable_extract_by_path(
scope,
image->path,
/* path_is_extension= */ false,
/* relax_extension_release_check= */ false,
@ -687,6 +692,7 @@ static int extract_image_and_extensions(
const char *e;
r = portable_extract_by_path(
scope,
ext->path,
/* path_is_extension= */ true,
relax_extension_release_check,
@ -754,6 +760,7 @@ static int extract_image_and_extensions(
}
int portable_extract(
RuntimeScope scope,
const char *name_or_path,
char **matches,
char **extension_image_paths,
@ -775,6 +782,7 @@ int portable_extract(
assert(name_or_path);
r = extract_image_and_extensions(
scope,
name_or_path,
matches,
extension_image_paths,
@ -1426,6 +1434,7 @@ static int image_target_path(
}
static int install_image(
RuntimeScope scope,
const char *image_path,
PortableFlags flags,
PortableChange **changes,
@ -1434,13 +1443,14 @@ static int install_image(
_cleanup_free_ char *target = NULL;
int r;
assert(scope < _RUNTIME_SCOPE_MAX);
assert(image_path);
/* If the image is outside of the image search also link it into it, so that it can be found with
* short image names and is listed among the images. If we are operating in mixed mode, the image is
* copied instead. */
if (image_in_search_path(IMAGE_PORTABLE, NULL, image_path))
if (image_in_search_path(scope, IMAGE_PORTABLE, NULL, image_path))
return 0;
r = image_target_path(image_path, flags, &target);
@ -1485,6 +1495,7 @@ static int install_image(
}
static int install_image_and_extensions(
RuntimeScope scope,
const Image *image,
OrderedHashmap *extension_images,
PortableFlags flags,
@ -1497,12 +1508,12 @@ static int install_image_and_extensions(
assert(image);
ORDERED_HASHMAP_FOREACH(ext, extension_images) {
r = install_image(ext->path, flags, changes, n_changes);
r = install_image(scope, ext->path, flags, changes, n_changes);
if (r < 0)
return r;
}
r = install_image(image->path, flags, changes, n_changes);
r = install_image(scope, image->path, flags, changes, n_changes);
if (r < 0)
return r;
@ -1595,6 +1606,7 @@ static void log_portable_verb(
}
int portable_attach(
RuntimeScope scope,
sd_bus *bus,
const char *name_or_path,
char **matches,
@ -1615,7 +1627,10 @@ int portable_attach(
PortableMetadata *item;
int r;
assert(scope < _RUNTIME_SCOPE_MAX);
r = extract_image_and_extensions(
scope,
name_or_path,
matches,
extension_image_paths,
@ -1672,13 +1687,13 @@ int portable_attach(
strempty(extensions_joined));
}
r = lookup_paths_init(&paths, RUNTIME_SCOPE_SYSTEM, /* flags= */ 0, NULL);
r = lookup_paths_init(&paths, scope, /* flags= */ 0, NULL);
if (r < 0)
return r;
if (!FLAGS_SET(flags, PORTABLE_REATTACH) && !FLAGS_SET(flags, PORTABLE_FORCE_ATTACH))
HASHMAP_FOREACH(item, unit_files) {
r = unit_file_exists(RUNTIME_SCOPE_SYSTEM, &paths, item->name);
r = unit_file_exists(scope, &paths, item->name);
if (r < 0)
return sd_bus_error_set_errnof(error, r, "Failed to determine whether unit '%s' exists on the host: %m", item->name);
if (r > 0)
@ -1700,7 +1715,7 @@ int portable_attach(
/* We don't care too much for the image symlink/copy, it's just a convenience thing, it's not necessary for
* proper operation otherwise. */
(void) install_image_and_extensions(image, extension_images, flags, changes, n_changes);
(void) install_image_and_extensions(scope, image, extension_images, flags, changes, n_changes);
log_portable_verb(
"attached",
@ -1844,6 +1859,7 @@ static int test_chroot_dropin(
}
int portable_detach(
RuntimeScope scope,
sd_bus *bus,
const char *name_or_path,
char **extension_image_paths,
@ -1857,12 +1873,12 @@ int portable_detach(
_cleanup_free_ char *extensions = NULL;
_cleanup_closedir_ DIR *d = NULL;
const char *where, *item;
int ret = 0;
int r;
int r, ret = 0;
assert(scope < _RUNTIME_SCOPE_MAX);
assert(name_or_path);
r = lookup_paths_init(&paths, RUNTIME_SCOPE_SYSTEM, /* flags= */ 0, NULL);
r = lookup_paths_init(&paths, scope, /* flags= */ 0, NULL);
if (r < 0)
return r;
@ -1930,7 +1946,7 @@ int portable_detach(
if (r == 0)
break;
if (path_is_absolute(image) && !image_in_search_path(IMAGE_PORTABLE, NULL, image)) {
if (path_is_absolute(image) && !image_in_search_path(scope, IMAGE_PORTABLE, NULL, image)) {
r = set_ensure_consume(&markers, &path_hash_ops_free, TAKE_PTR(image));
if (r < 0)
return r;
@ -2031,6 +2047,7 @@ not_found:
}
static int portable_get_state_internal(
RuntimeScope scope,
sd_bus *bus,
const char *name_or_path,
char **extension_image_paths,
@ -2045,10 +2062,11 @@ static int portable_get_state_internal(
const char *where;
int r;
assert(scope < _RUNTIME_SCOPE_MAX);
assert(name_or_path);
assert(ret);
r = lookup_paths_init(&paths, RUNTIME_SCOPE_SYSTEM, /* flags= */ 0, NULL);
r = lookup_paths_init(&paths, scope, /* flags= */ 0, NULL);
if (r < 0)
return r;
@ -2084,7 +2102,7 @@ static int portable_get_state_internal(
if (r == 0)
continue;
r = unit_file_lookup_state(RUNTIME_SCOPE_SYSTEM, &paths, de->d_name, &state);
r = unit_file_lookup_state(scope, &paths, de->d_name, &state);
if (r < 0)
return log_debug_errno(r, "Failed to determine unit file state of '%s': %m", de->d_name);
if (!IN_SET(state, UNIT_FILE_STATIC, UNIT_FILE_DISABLED, UNIT_FILE_LINKED, UNIT_FILE_LINKED_RUNTIME))
@ -2109,6 +2127,7 @@ static int portable_get_state_internal(
}
int portable_get_state(
RuntimeScope scope,
sd_bus *bus,
const char *name_or_path,
char **extension_image_paths,
@ -2125,12 +2144,19 @@ int portable_get_state(
/* We look for matching units twice: once in the regular directories, and once in the runtime directories — but
* the latter only if we didn't find anything in the former. */
r = portable_get_state_internal(bus, name_or_path, extension_image_paths, flags & ~PORTABLE_RUNTIME, &state, error);
r = portable_get_state_internal(
scope,
bus,
name_or_path,
extension_image_paths,
flags & ~PORTABLE_RUNTIME,
&state,
error);
if (r < 0)
return r;
if (state == PORTABLE_DETACHED) {
r = portable_get_state_internal(bus, name_or_path, extension_image_paths, flags | PORTABLE_RUNTIME, &state, error);
r = portable_get_state_internal(scope, bus, name_or_path, extension_image_paths, flags | PORTABLE_RUNTIME, &state, error);
if (r < 0)
return r;
}

View File

@ -6,6 +6,7 @@
#include "dissect-image.h"
#include "hashmap.h"
#include "macro.h"
#include "runtime-scope.h"
#include "set.h"
#include "string-util.h"
@ -69,12 +70,12 @@ DEFINE_TRIVIAL_CLEANUP_FUNC(PortableMetadata*, portable_metadata_unref);
int portable_metadata_hashmap_to_sorted_array(Hashmap *unit_files, PortableMetadata ***ret);
int portable_extract(const char *image, char **matches, char **extension_image_paths, const ImagePolicy *image_policy, PortableFlags flags, PortableMetadata **ret_os_release, OrderedHashmap **ret_extension_releases, Hashmap **ret_unit_files, char ***ret_valid_prefixes, sd_bus_error *error);
int portable_extract(RuntimeScope scope, const char *image, char **matches, char **extension_image_paths, const ImagePolicy *image_policy, PortableFlags flags, PortableMetadata **ret_os_release, OrderedHashmap **ret_extension_releases, Hashmap **ret_unit_files, char ***ret_valid_prefixes, sd_bus_error *error);
int portable_attach(sd_bus *bus, const char *name_or_path, char **matches, const char *profile, char **extension_images, const ImagePolicy* image_policy, PortableFlags flags, PortableChange **changes, size_t *n_changes, sd_bus_error *error);
int portable_detach(sd_bus *bus, const char *name_or_path, char **extension_image_paths, PortableFlags flags, PortableChange **changes, size_t *n_changes, sd_bus_error *error);
int portable_attach(RuntimeScope scope, sd_bus *bus, const char *name_or_path, char **matches, const char *profile, char **extension_images, const ImagePolicy* image_policy, PortableFlags flags, PortableChange **changes, size_t *n_changes, sd_bus_error *error);
int portable_detach(RuntimeScope scope, sd_bus *bus, const char *name_or_path, char **extension_image_paths, PortableFlags flags, PortableChange **changes, size_t *n_changes, sd_bus_error *error);
int portable_get_state(sd_bus *bus, const char *name_or_path, char **extension_image_paths, PortableFlags flags, PortableState *ret, sd_bus_error *error);
int portable_get_state(RuntimeScope scope, sd_bus *bus, const char *name_or_path, char **extension_image_paths, PortableFlags flags, PortableState *ret, sd_bus_error *error);
int portable_get_profiles(char ***ret);

View File

@ -165,6 +165,7 @@ static int method_list_images(sd_bus_message *message, void *userdata, sd_bus_er
return r;
r = portable_get_state(
m->runtime_scope,
sd_bus_message_get_bus(message),
image->path,
NULL,
@ -225,6 +226,7 @@ static int method_get_image_metadata(sd_bus_message *message, void *userdata, sd
static int method_get_image_state(sd_bus_message *message, void *userdata, sd_bus_error *error) {
_cleanup_strv_free_ char **extension_images = NULL;
Manager *m = ASSERT_PTR(userdata);
const char *name_or_path;
PortableState state;
int r;
@ -254,6 +256,7 @@ static int method_get_image_state(sd_bus_message *message, void *userdata, sd_bu
}
r = portable_get_state(
m->runtime_scope,
sd_bus_message_get_bus(message),
name_or_path,
extension_images,
@ -330,6 +333,7 @@ static int method_detach_image(sd_bus_message *message, void *userdata, sd_bus_e
return 1; /* Will call us back */
r = portable_detach(
m->runtime_scope,
sd_bus_message_get_bus(message),
name_or_path,
extension_images,

View File

@ -114,10 +114,8 @@ int bus_image_common_get_metadata(
assert(name_or_path || image);
assert(message);
if (!m) {
assert(image);
m = image->userdata;
}
if (!m)
m = ASSERT_PTR(ASSERT_PTR(image)->userdata);
bool have_exti = sd_bus_message_is_method_call(message, NULL, "GetImageMetadataWithExtensions") ||
sd_bus_message_is_method_call(message, NULL, "GetMetadataWithExtensions");
@ -160,6 +158,7 @@ int bus_image_common_get_metadata(
return 1;
r = portable_extract(
m->runtime_scope,
image->path,
matches,
extension_images,
@ -264,6 +263,7 @@ static int bus_image_method_get_state(
_cleanup_strv_free_ char **extension_images = NULL;
Image *image = ASSERT_PTR(userdata);
Manager *m = ASSERT_PTR(image->userdata);
PortableState state;
int r;
@ -288,6 +288,7 @@ static int bus_image_method_get_state(
}
r = portable_get_state(
m->runtime_scope,
sd_bus_message_get_bus(message),
image->path,
extension_images,
@ -385,6 +386,7 @@ int bus_image_common_attach(
return 1;
r = portable_attach(
m->runtime_scope,
sd_bus_message_get_bus(message),
image->path,
matches,
@ -463,6 +465,7 @@ static int bus_image_method_detach(
return 1; /* Will call us back */
r = portable_detach(
m->runtime_scope,
sd_bus_message_get_bus(message),
image->path,
extension_images,
@ -513,6 +516,7 @@ int bus_image_common_remove(
return 1; /* Will call us back */
r = portable_get_state(
m->runtime_scope,
sd_bus_message_get_bus(message),
image->path,
NULL,
@ -716,6 +720,7 @@ int bus_image_common_reattach(
return 1;
r = portable_detach(
m->runtime_scope,
sd_bus_message_get_bus(message),
image->path,
extension_images,
@ -727,6 +732,7 @@ int bus_image_common_reattach(
return r;
r = portable_attach(
m->runtime_scope,
sd_bus_message_get_bus(message),
image->path,
matches,
@ -1039,7 +1045,7 @@ int bus_image_acquire(
if (image_name_is_valid(name_or_path)) {
/* If it's a short name, let's search for it */
r = image_find(IMAGE_PORTABLE, name_or_path, NULL, &loaded);
r = image_find(m->runtime_scope, IMAGE_PORTABLE, name_or_path, NULL, &loaded);
if (r == -ENOENT)
return sd_bus_error_setf(error, BUS_ERROR_NO_SUCH_PORTABLE_IMAGE,
"No image '%s' found.", name_or_path);

View File

@ -91,7 +91,7 @@ int manager_image_cache_discover(Manager *m, Hashmap *images, sd_bus_error *erro
/* A wrapper around image_discover() (for finding images in search path) and portable_discover_attached() (for
* finding attached images). */
r = image_discover(IMAGE_PORTABLE, NULL, images);
r = image_discover(m->runtime_scope, IMAGE_PORTABLE, NULL, images);
if (r < 0)
return r;

View File

@ -28,10 +28,14 @@ static int manager_new(Manager **ret) {
assert(ret);
m = new0(Manager, 1);
m = new(Manager, 1);
if (!m)
return -ENOMEM;
*m = (Manager) {
.runtime_scope = RUNTIME_SCOPE_SYSTEM,
};
r = sd_event_default(&m->event);
if (r < 0)
return r;

View File

@ -7,6 +7,7 @@
#include "bus-object.h"
#include "hashmap.h"
#include "list.h"
#include "runtime-scope.h"
typedef struct Manager Manager;
@ -23,6 +24,8 @@ struct Manager {
LIST_HEAD(Operation, operations);
unsigned n_operations;
RuntimeScope runtime_scope; /* for now always RUNTIME_SCOPE_SYSTEM */
};
extern const BusObjectImplementation manager_object;

View File

@ -21,6 +21,7 @@
#include "battery-util.h"
#include "blockdev-util.h"
#include "cap-list.h"
#include "capability-util.h"
#include "cgroup-util.h"
#include "compare-operator.h"
#include "condition.h"
@ -701,45 +702,23 @@ static int condition_test_security(Condition *c, char **env) {
}
static int condition_test_capability(Condition *c, char **env) {
unsigned long long capabilities = (unsigned long long) -1;
_cleanup_fclose_ FILE *f = NULL;
int value, r;
int r;
assert(c);
assert(c->parameter);
assert(c->type == CONDITION_CAPABILITY);
/* If it's an invalid capability, we don't have it */
value = capability_from_name(c->parameter);
int value = capability_from_name(c->parameter);
if (value < 0)
return -EINVAL;
/* If it's a valid capability we default to assume
* that we have it */
f = fopen("/proc/self/status", "re");
if (!f)
return -errno;
for (;;) {
_cleanup_free_ char *line = NULL;
r = read_line(f, LONG_LINE_MAX, &line);
CapabilityQuintet q;
r = pidref_get_capability(&PIDREF_MAKE_FROM_PID(getpid_cached()), &q);
if (r < 0)
return r;
if (r == 0)
break;
const char *p = startswith(line, "CapBnd:");
if (p) {
if (sscanf(p, "%llx", &capabilities) != 1)
return -EIO;
break;
}
}
return !!(capabilities & (1ULL << value));
return !!(q.bounding & ((UINT64_C(1) << value)));
}
static int condition_test_needs_update(Condition *c, char **env) {

View File

@ -12,6 +12,8 @@
#include <sys/stat.h>
#include <unistd.h>
#include "sd-path.h"
#include "alloc-util.h"
#include "blockdev-util.h"
#include "btrfs-util.h"
@ -553,12 +555,95 @@ static int image_make(
return -EMEDIUMTYPE;
}
static const char *pick_image_search_path(ImageClass class) {
if (class < 0 || class >= _IMAGE_CLASS_MAX)
return NULL;
static int pick_image_search_path(
RuntimeScope scope,
ImageClass class,
char ***ret) {
int r;
assert(scope < _RUNTIME_SCOPE_MAX && scope != RUNTIME_SCOPE_GLOBAL);
assert(class < _IMAGE_CLASS_MAX);
assert(ret);
if (class < 0) {
*ret = NULL;
return 0;
}
if (scope < 0) {
_cleanup_strv_free_ char **a = NULL, **b = NULL;
r = pick_image_search_path(RUNTIME_SCOPE_USER, class, &a);
if (r < 0)
return r;
r = pick_image_search_path(RUNTIME_SCOPE_SYSTEM, class, &b);
if (r < 0)
return r;
r = strv_extend_strv(&a, b, /* filter_duplicates= */ false);
if (r < 0)
return r;
*ret = TAKE_PTR(a);
return 0;
}
switch (scope) {
case RUNTIME_SCOPE_SYSTEM: {
const char *ns;
/* Use the initrd search path if there is one, otherwise use the common one */
return in_initrd() && image_search_path_initrd[class] ? image_search_path_initrd[class] : image_search_path[class];
ns = in_initrd() && image_search_path_initrd[class] ?
image_search_path_initrd[class] :
image_search_path[class];
if (!ns)
break;
_cleanup_strv_free_ char **search = strv_split_nulstr(ns);
if (!search)
return -ENOMEM;
*ret = TAKE_PTR(search);
return 0;
}
case RUNTIME_SCOPE_USER: {
if (class != IMAGE_MACHINE)
break;
static const uint64_t dirs[] = {
SD_PATH_USER_RUNTIME,
SD_PATH_USER_STATE_PRIVATE,
SD_PATH_USER_LIBRARY_PRIVATE,
};
_cleanup_strv_free_ char **search = NULL;
FOREACH_ELEMENT(d, dirs) {
_cleanup_free_ char *p = NULL;
r = sd_path_lookup(*d, "machines", &p);
if (r == -ENXIO) /* No XDG_RUNTIME_DIR set */
continue;
if (r < 0)
return r;
r = strv_consume(&search, TAKE_PTR(p));
if (r < 0)
return r;
}
*ret = TAKE_PTR(search);
return 0;
}
default:
assert_not_reached();
}
*ret = NULL;
return 0;
}
static char **make_possible_filenames(ImageClass class, const char *image_name) {
@ -592,7 +677,8 @@ static char **make_possible_filenames(ImageClass class, const char *image_name)
return TAKE_PTR(l);
}
int image_find(ImageClass class,
int image_find(RuntimeScope scope,
ImageClass class,
const char *name,
const char *root,
Image **ret) {
@ -602,6 +688,7 @@ int image_find(ImageClass class,
* some root directory.) */
int open_flags = root ? O_NOFOLLOW : 0, r;
assert(scope < _RUNTIME_SCOPE_MAX && scope != RUNTIME_SCOPE_GLOBAL);
assert(class >= 0);
assert(class < _IMAGE_CLASS_MAX);
assert(name);
@ -614,11 +701,16 @@ int image_find(ImageClass class,
if (!names)
return -ENOMEM;
NULSTR_FOREACH(path, pick_image_search_path(class)) {
_cleanup_strv_free_ char **search = NULL;
r = pick_image_search_path(scope, class, &search);
if (r < 0)
return r;
STRV_FOREACH(path, search) {
_cleanup_free_ char *resolved = NULL;
_cleanup_closedir_ DIR *d = NULL;
r = chase_and_opendir(path, root, CHASE_PREFIX_ROOT, &resolved, &d);
r = chase_and_opendir(*path, root, CHASE_PREFIX_ROOT, &resolved, &d);
if (r == -ENOENT)
continue;
if (r < 0)
@ -722,7 +814,7 @@ int image_find(ImageClass class,
}
}
if (class == IMAGE_MACHINE && streq(name, ".host")) {
if (scope == RUNTIME_SCOPE_SYSTEM && class == IMAGE_MACHINE && streq(name, ".host")) {
r = image_make(class,
".host",
/* dir_fd= */ AT_FDCWD,
@ -771,14 +863,21 @@ int image_from_path(const char *path, Image **ret) {
ret);
}
int image_find_harder(ImageClass class, const char *name_or_path, const char *root, Image **ret) {
int image_find_harder(
RuntimeScope scope,
ImageClass class,
const char *name_or_path,
const char *root,
Image **ret) {
if (image_name_is_valid(name_or_path))
return image_find(class, name_or_path, root, ret);
return image_find(scope, class, name_or_path, root, ret);
return image_from_path(name_or_path, ret);
}
int image_discover(
RuntimeScope scope,
ImageClass class,
const char *root,
Hashmap *h) {
@ -788,15 +887,21 @@ int image_discover(
* some root directory.) */
int open_flags = root ? O_NOFOLLOW : 0, r;
assert(scope < _RUNTIME_SCOPE_MAX && scope != RUNTIME_SCOPE_GLOBAL);
assert(class >= 0);
assert(class < _IMAGE_CLASS_MAX);
assert(h);
NULSTR_FOREACH(path, pick_image_search_path(class)) {
_cleanup_strv_free_ char **search = NULL;
r = pick_image_search_path(scope, class, &search);
if (r < 0)
return r;
STRV_FOREACH(path, search) {
_cleanup_free_ char *resolved = NULL;
_cleanup_closedir_ DIR *d = NULL;
r = chase_and_opendir(path, root, CHASE_PREFIX_ROOT, &resolved, &d);
r = chase_and_opendir(*path, root, CHASE_PREFIX_ROOT, &resolved, &d);
if (r == -ENOENT)
continue;
if (r < 0)
@ -946,7 +1051,7 @@ int image_discover(
}
}
if (class == IMAGE_MACHINE && !hashmap_contains(h, ".host")) {
if (scope == RUNTIME_SCOPE_SYSTEM && class == IMAGE_MACHINE && !hashmap_contains(h, ".host")) {
_cleanup_(image_unrefp) Image *image = NULL;
r = image_make(IMAGE_MACHINE,
@ -1063,7 +1168,7 @@ static int rename_auxiliary_file(const char *path, const char *new_name, const c
return rename_noreplace(AT_FDCWD, path, AT_FDCWD, rs);
}
int image_rename(Image *i, const char *new_name) {
int image_rename(Image *i, const char *new_name, RuntimeScope scope) {
_cleanup_(release_lock_file) LockFile global_lock = LOCK_FILE_INIT, local_lock = LOCK_FILE_INIT, name_lock = LOCK_FILE_INIT;
_cleanup_free_ char *new_path = NULL, *nn = NULL, *roothash = NULL;
_cleanup_strv_free_ char **settings = NULL;
@ -1098,7 +1203,7 @@ int image_rename(Image *i, const char *new_name) {
if (r < 0)
return r;
r = image_find(IMAGE_MACHINE, new_name, NULL, NULL);
r = image_find(scope, IMAGE_MACHINE, new_name, NULL, NULL);
if (r >= 0)
return -EEXIST;
if (r != -ENOENT)
@ -1185,7 +1290,7 @@ static int clone_auxiliary_file(const char *path, const char *new_name, const ch
return copy_file_atomic(path, rs, 0664, COPY_REFLINK);
}
int image_clone(Image *i, const char *new_name, bool read_only) {
int image_clone(Image *i, const char *new_name, bool read_only, RuntimeScope scope) {
_cleanup_(release_lock_file) LockFile name_lock = LOCK_FILE_INIT;
_cleanup_strv_free_ char **settings = NULL;
_cleanup_free_ char *roothash = NULL;
@ -1212,7 +1317,7 @@ int image_clone(Image *i, const char *new_name, bool read_only) {
if (r < 0)
return r;
r = image_find(IMAGE_MACHINE, new_name, NULL, NULL);
r = image_find(scope, IMAGE_MACHINE, new_name, NULL, NULL);
if (r >= 0)
return -EEXIST;
if (r != -ENOENT)
@ -1646,24 +1751,35 @@ int image_name_lock(const char *name, int operation, LockFile *ret) {
}
bool image_in_search_path(
RuntimeScope scope,
ImageClass class,
const char *root,
const char *image) {
int r;
assert(scope < _RUNTIME_SCOPE_MAX && scope != RUNTIME_SCOPE_GLOBAL);
assert(class >= 0);
assert(class < _IMAGE_CLASS_MAX);
assert(image);
NULSTR_FOREACH(path, pick_image_search_path(class)) {
_cleanup_strv_free_ char **search = NULL;
r = pick_image_search_path(scope, class, &search);
if (r < 0)
return r;
STRV_FOREACH(path, search) {
const char *p, *q;
size_t k;
if (!empty_or_root(root)) {
q = path_startswith(path, root);
q = path_startswith(*path, root);
if (!q)
continue;
} else
q = path;
q = *path;
p = path_startswith(q, path);
p = path_startswith(q, *path);
if (!p)
continue;

Some files were not shown because too many files have changed in this diff Show More