linux/include
Christian Brauner b3e5838252
clone: add CLONE_PIDFD
This patchset makes it possible to retrieve pid file descriptors at
process creation time by introducing the new flag CLONE_PIDFD to the
clone() system call.  Linus originally suggested to implement this as a
new flag to clone() instead of making it a separate system call.  As
spotted by Linus, there is exactly one bit for clone() left.

CLONE_PIDFD creates file descriptors based on the anonymous inode
implementation in the kernel that will also be used to implement the new
mount api.  They serve as a simple opaque handle on pids.  Logically,
this makes it possible to interpret a pidfd differently, narrowing or
widening the scope of various operations (e.g. signal sending).  Thus, a
pidfd cannot just refer to a tgid, but also a tid, or in theory - given
appropriate flag arguments in relevant syscalls - a process group or
session. A pidfd does not represent a privilege.  This does not imply it
cannot ever be that way but for now this is not the case.

A pidfd comes with additional information in fdinfo if the kernel supports
procfs.  The fdinfo file contains the pid of the process in the callers
pid namespace in the same format as the procfs status file, i.e. "Pid:\t%d".

As suggested by Oleg, with CLONE_PIDFD the pidfd is returned in the
parent_tidptr argument of clone.  This has the advantage that we can
give back the associated pid and the pidfd at the same time.

To remove worries about missing metadata access this patchset comes with
a sample program that illustrates how a combination of CLONE_PIDFD, and
pidfd_send_signal() can be used to gain race-free access to process
metadata through /proc/<pid>.  The sample program can easily be
translated into a helper that would be suitable for inclusion in libc so
that users don't have to worry about writing it themselves.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Christian Brauner <christian@brauner.io>
Co-developed-by: Jann Horn <jannh@google.com>
Signed-off-by: Jann Horn <jannh@google.com>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: David Howells <dhowells@redhat.com>
Cc: "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com>
Cc: Andy Lutomirsky <luto@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Aleksa Sarai <cyphar@cyphar.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
2019-05-07 14:31:03 +02:00
..
acpi ACPI: use different default debug value than ACPICA 2019-03-25 10:45:59 +01:00
asm-generic syscalls: Remove start and number from syscall_set_arguments() args 2019-04-05 09:27:23 -04:00
clocksource
crypto
drm drm i915, amdgpu, qxl and etnaviv fixes 2019-03-15 13:58:35 -07:00
dt-bindings dt-bindings: reset: meson-g12a: Add missing USB2 PHY resets 2019-03-25 16:22:10 +01:00
keys Merge branch 'next-integrity' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security 2019-03-10 17:32:04 -07:00
kvm ARM: some cleanups, direct physical timer assignment, cache sanitization 2019-03-15 15:00:28 -07:00
linux clone: add CLONE_PIDFD 2019-05-07 14:31:03 +02:00
math-emu
media
memory
misc auxdisplay: charlcd: Introduce charlcd_free() helper 2019-03-17 08:48:16 +01:00
net net: sched: introduce and use qdisc tree flush/purge helpers 2019-04-01 14:50:13 -07:00
pcmcia
ras
rdma
scsi
soc IOMMU Updates for Linux v5.1 2019-03-10 12:29:52 -07:00
sound sound fixes for 5.1-rc1 2019-03-15 14:05:00 -07:00
target
trace syscalls: Remove start and number from syscall_get_arguments() args 2019-04-05 09:26:43 -04:00
uapi clone: add CLONE_PIDFD 2019-05-07 14:31:03 +02:00
video media updates for v5.1-rc1 2019-03-09 14:45:54 -08:00
xen