From 98b1d2b8d9ea27087a5980b4b902b6a6ab716e03 Mon Sep 17 00:00:00 2001 From: Alan Jenkins Date: Wed, 17 Jan 2018 13:28:04 +0000 Subject: [PATCH 1/3] core: namespace: nitpick /dev/ptmx error handling If /dev/tty did not exist, or had st_rdev == 0, we ignored it. And the same is true for null, zero, full, random, urandom. If /dev/ptmx did not exist, we treated this as a failure. If /dev/ptmx had st_rdev == 0, we ignored it. This was a very recent change, but there was no reason for ptmx creation specifically to treat st_rdev == 0 differently from non-existence. This confuses me when reading it. Change the creation of /dev/ptmx so that st_rdev == 0 is treated as failure. This still leaves /dev/ptmx as a special case with stricter handling. However it is consistent with the immediately preceding creation of /dev/pts/, which is treated as essential, and is directly related to ptmx. I don't know why we check st_rdev. But I'd prefer to have only one unanswered question here, and not to have a second unanswered question added on top. --- src/core/namespace.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/core/namespace.c b/src/core/namespace.c index d6c1b1b2fc1..e20c5007933 100644 --- a/src/core/namespace.c +++ b/src/core/namespace.c @@ -525,7 +525,7 @@ static int clone_device_node(const char *d, const char *temporary_mount) { if (r < 0) return -errno; - return 0; + return 1; } static int mount_private_dev(MountEntry *m) { @@ -582,7 +582,7 @@ static int mount_private_dev(MountEntry *m) { } } else { r = clone_device_node("/dev/ptmx", temporary_mount); - if (r < 0) + if (r != 1) goto fail; } From 45a582d5369b501743ac4dded3a481e6a589c029 Mon Sep 17 00:00:00 2001 From: Alan Jenkins Date: Mon, 15 Jan 2018 16:55:11 +0000 Subject: [PATCH 2/3] README: fix context for CONFIG_DEVPTS_MULTIPLE_INSTANCES `newinstance` (and `ptmxmode`) options of devpts are _not_ used by PrivateDevices=. (/dev/pts is shared, similar to how /dev/shm and /dev/mqueue are handled). It is used by nspawn containers though. Also CONFIG_DEVPTS_MULTIPLE_INSTANCES was removed in 4.7-rc2 https://github.com/torvalds/linux/commit/eedf265aa003b4781de24cfed40a655a664457e6 and no longer needs to be set, so make that clearer to avoid confusion. --- README | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/README b/README index e54c5d6efb8..8807e5cfe49 100644 --- a/README +++ b/README @@ -69,11 +69,10 @@ REQUIREMENTS: create additional symlinks in /dev/disk/ and /dev/tape: CONFIG_BLK_DEV_BSG - Required for PrivateNetwork= and PrivateDevices= in service units: + Required for PrivateNetwork= in service units: CONFIG_NET_NS - CONFIG_DEVPTS_MULTIPLE_INSTANCES Note that systemd-localed.service and other systemd units use - PrivateNetwork and PrivateDevices so this is effectively required. + PrivateNetwork so this is effectively required. Required for PrivateUsers= in service units: CONFIG_USER_NS @@ -119,6 +118,9 @@ REQUIREMENTS: isn't. The next best thing is to make this change through a modprobe.d drop-in. This is shipped by default, see modprobe.d/systemd.conf. + Required for systemd-nspawn: + CONFIG_DEVPTS_MULTIPLE_INSTANCES or Linux kernel >= 4.7 + Note that kernel auditing is broken when used with systemd's container code. When using systemd in conjunction with containers, please make sure to either turn off auditing at From 8d95368210007e50df80a9d6e6b2b86010ce2585 Mon Sep 17 00:00:00 2001 From: Alan Jenkins Date: Wed, 17 Jan 2018 12:53:26 +0000 Subject: [PATCH 3/3] core: namespace: remove unnecessary mode on /dev/shm mount target This should have no behavioural effect; it just confused me. All the other mount directories in this function are created as 0755. Some of the mounts are allowed to fail - mqueue and hugepages. If the /dev/mqueue mount target was created with the permissive mode 01777, to match the filesystem we're trying to mount there, then a mount failure would allow unprivileged users to write to the /dev filesystem, e.g. to exhaust the available space. There is no reason to allow this. (Allowing the user read access (0755) seems a reasonable idea though, e.g. for quicker troubleshooting.) We do not allow failure of the /dev/shm mount, so it doesn't matter that it is created as 01777. But on the same grounds, we have no *reason* to create it as any specific mode. 0755 is equally fine. This function will be clearer by using 0755 throughout, to avoid unintentionally implying some connection between the mode of the mount target, and the mode of the mounted filesystem. --- src/core/namespace.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/core/namespace.c b/src/core/namespace.c index e20c5007933..19678e6d57c 100644 --- a/src/core/namespace.c +++ b/src/core/namespace.c @@ -587,7 +587,7 @@ static int mount_private_dev(MountEntry *m) { } devshm = strjoina(temporary_mount, "/dev/shm"); - (void) mkdir(devshm, 01777); + (void) mkdir(devshm, 0755); r = mount("/dev/shm", devshm, NULL, MS_BIND, NULL); if (r < 0) { r = -errno;