linux/Documentation/userspace-api
Jeff Xu 653c5c7511 mm/memfd: add documentation for MFD_NOEXEC_SEAL MFD_EXEC
When MFD_NOEXEC_SEAL was introduced, there was one big mistake: it didn't
have proper documentation.  This led to a lot of confusion, especially
about whether or not memfd created with the MFD_NOEXEC_SEAL flag is
sealable.  Before MFD_NOEXEC_SEAL, memfd had to explicitly set
MFD_ALLOW_SEALING to be sealable, so it's a fair question.

As one might have noticed, unlike other flags in memfd_create,
MFD_NOEXEC_SEAL is actually a combination of multiple flags.  The idea is
to make it easier to use memfd in the most common way, which is NOEXEC +
F_SEAL_EXEC + MFD_ALLOW_SEALING.  This works with sysctl vm.noexec to help
existing applications move to a more secure way of using memfd.

Proposals have been made to put MFD_NOEXEC_SEAL non-sealable, unless
MFD_ALLOW_SEALING is set, to be consistent with other flags [1], Those
are based on the viewpoint that each flag is an atomic unit, which is a
reasonable assumption.  However, MFD_NOEXEC_SEAL was designed with the
intent of promoting the most secure method of using memfd, therefore a
combination of multiple functionalities into one bit.

Furthermore, the MFD_NOEXEC_SEAL has been added for more than one year,
and multiple applications and distributions have backported and utilized
it.  Altering ABI now presents a degree of risk and may lead to
disruption.

MFD_NOEXEC_SEAL is a new flag, and applications must change their code to
use it.  There is no backward compatibility problem.

When sysctl vm.noexec == 1 or 2, applications that don't set
MFD_NOEXEC_SEAL or MFD_EXEC will get MFD_NOEXEC_SEAL memfd.  And
old-application might break, that is by-design, in such a system vm.noexec
= 0 shall be used.  Also no backward compatibility problem.

I propose to include this documentation patch to assist in clarifying the
semantics of MFD_NOEXEC_SEAL, thereby preventing any potential future
confusion.

Finally, I would like to express my gratitude to David Rheinsberg and
Barnabás Pőcze for initiating the discussion on the topic of sealability.

[1]
https://lore.kernel.org/lkml/20230714114753.170814-1-david@readahead.eu/

[jeffxu@chromium.org: updates per Randy]
  Link: https://lkml.kernel.org/r/20240611034903.3456796-2-jeffxu@chromium.org
[jeffxu@chromium.org: v3]
  Link: https://lkml.kernel.org/r/20240611231409.3899809-2-jeffxu@chromium.org
Link: https://lkml.kernel.org/r/20240607203543.2151433-2-jeffxu@google.com
Signed-off-by: Jeff Xu <jeffxu@chromium.org>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Aleksa Sarai <cyphar@cyphar.com>
Cc: Barnabás Pőcze <pobrn@protonmail.com>
Cc: Daniel Verkamp <dverkamp@chromium.org>
Cc: David Rheinsberg <david@readahead.eu>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jorge Lucangeli Obes <jorgelo@chromium.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-06-15 10:43:07 -07:00
..
accelerators Documentation: ocxl.rst: change FPGA indirect article to an 2021-06-09 14:51:25 +02:00
ebpf
gpio Documentation: gpio: fix typo 2024-04-02 10:50:28 +02:00
ioctl ntsync: Introduce NTSYNC_IOC_CREATE_SEM. 2024-04-11 15:34:38 +02:00
media media: Documentation: v4l: Fix ACTIVE route flag 2024-05-28 08:00:14 +02:00
netlink ynl: support binary and integer sub-type for indexed-array 2024-04-05 22:32:49 -07:00
dcdbas.rst Documentation: move driver-api/dcdbas to userspace-api/ 2024-01-03 14:17:40 -07:00
dma-buf-alloc-exchange.rst doc: uapi: Add document describing dma-buf semantics 2023-08-21 18:20:05 +02:00
ELF.rst ELF: document some de-facto PT_* ABI quirks 2023-04-20 17:53:38 -06:00
futex2.rst futex2: Documentation: Document sys_futex_waitv() uAPI 2021-10-07 13:51:13 +02:00
index.rst mm/memfd: add documentation for MFD_NOEXEC_SEAL MFD_EXEC 2024-06-15 10:43:07 -07:00
iommu.rst
iommufd.rst Documentation: userspace-api: correct spelling 2023-02-02 11:07:18 -07:00
isapnp.rst Documentation: move driver-api/isapnp to userspace-api/ 2024-01-03 14:17:39 -07:00
landlock.rst landlock: Document IOCTL support 2024-05-13 06:58:34 +02:00
lsm.rst LSM: Create lsm_list_modules system call 2023-11-12 22:54:42 -05:00
mfd_noexec.rst mm/memfd: add documentation for MFD_NOEXEC_SEAL MFD_EXEC 2024-06-15 10:43:07 -07:00
mseal.rst mseal: add documentation 2024-05-23 19:40:26 -07:00
no_new_privs.rst
perf_ring_buffer.rst Documentation: userspace-api: Document perf ring buffer mechanism 2024-01-30 13:49:02 -07:00
seccomp_filter.rst Documentation: userspace-api: correct spelling 2023-02-02 11:07:18 -07:00
spec_ctrl.rst Documentation: Add L1D flushing Documentation 2021-07-28 11:42:25 +02:00
sysfs-platform_profile.rst Documentation: userspace-api: correct spelling 2023-02-02 11:07:18 -07:00
tee.rst Documentation: Destage TEE subsystem documentation 2023-12-08 15:45:10 -07:00
unshare.rst
vduse.rst VDUSE: fix documentation underline warning 2021-10-13 08:42:07 -04:00