f008b1d6e1
75434 Commits
Author | SHA1 | Message | Date | |
---|---|---|---|---|
Linus Torvalds
|
f008b1d6e1 |
Netfs prep for write helpers
-----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEqG5UsNXhtOCrfGQP+7dXa6fLC2sFAmI1HOwACgkQ+7dXa6fL C2u9mA/+LUdXHqlvET/PAtFTg75bUPeOFGLnuDnYl1Ng2FCKMSodAohpbVtENxsK E/gTVS7uiVZFQgC+YmNA00z6eIQkAaDVyvKyEcUbKREBbUgONfJ/HLeaK/NvVKxx TY5gx/POdG6yHRQXL6JGBqSJUB8bZrGKwnJm8ebzeKOji9n7GSJBYiMlYBA7EAhs Aut/P7Y39ISHLw3y+y5czBeRoubljmTyznbP20xUZEzrRwhTpNwpJVzBGUZU635T 93Sqcp//0U5LIdn6Pg6DUGHBMBTNDNJChb21ZoBusF/HHswXsOOnf/mcRUBSJUTI M1WSpNLk8PRBgajMdIymQpGU1sCZZzJ3krrSA3RcXdN6GPHwZg8kKjoroHsLDL6l igPbDSMJ5wfiwA2A2gXbY1CkAl3ik5ccb7ZqhTwS0WBk0vOnHmAsE9cs/bBo7Xii GTiWXEFOgtJiXANPMS2P9DiOS3ZQNf+wxotCYdkGPOXuX9wnIo1Kmy8XfujQ1bXf pJsEZKfeyROKrzyKWgqLI64/Kg5xNueoFQZfDpOlZYzF1uDstynADPUt0eQD706q jcuKaXLN3rn5gSPun5mWOYbRtXVgOLdFL/7zptMVJwFKBFguQENhjG4UMNZcjkVA 3Mr0kGocsgoCSk1oDBkFlrw1wIsXxWbkRBL1Pww6kovivuGUwoo= =j0yx -----END PGP SIGNATURE----- Merge tag 'netfs-prep-20220318' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs Pull netfs updates from David Howells: "Netfs prep for write helpers. Having had a go at implementing write helpers and content encryption support in netfslib, it seems that the netfs_read_{,sub}request structs and the equivalent write request structs were almost the same and so should be merged, thereby requiring only one set of alloc/get/put functions and a common set of tracepoints. Merging the structs also has the advantage that if a bounce buffer is added to the request struct, a read operation can be performed to fill the bounce buffer, the contents of the buffer can be modified and then a write operation can be performed on it to send the data wherever it needs to go using the same request structure all the way through. The I/O handlers would then transparently perform any required crypto. This should make it easier to perform RMW cycles if needed. The potentially common functions and structs, however, by their names all proclaim themselves to be associated with the read side of things. The bulk of these changes alter this in the following ways: - Rename struct netfs_read_{,sub}request to netfs_io_{,sub}request. - Rename some enums, members and flags to make them more appropriate. - Adjust some comments to match. - Drop "read"/"rreq" from the names of common functions. For instance, netfs_get_read_request() becomes netfs_get_request(). - The ->init_rreq() and ->issue_op() methods become ->init_request() and ->issue_read(). I've kept the latter as a read-specific function and in another branch added an ->issue_write() method. The driver source is then reorganised into a number of files: fs/netfs/buffered_read.c Create read reqs to the pagecache fs/netfs/io.c Dispatchers for read and write reqs fs/netfs/main.c Some general miscellaneous bits fs/netfs/objects.c Alloc, get and put functions fs/netfs/stats.c Optional procfs statistics. and future development can be fitted into this scheme, e.g.: fs/netfs/buffered_write.c Modify the pagecache fs/netfs/buffered_flush.c Writeback from the pagecache fs/netfs/direct_read.c DIO read support fs/netfs/direct_write.c DIO write support fs/netfs/unbuffered_write.c Write modifications directly back Beyond the above changes, there are also some changes that affect how things work: - Make fscache_end_operation() generally available. - In the netfs tracing header, generate enums from the symbol -> string mapping tables rather than manually coding them. - Add a struct for filesystems that uses netfslib to put into their inode wrapper structs to hold extra state that netfslib is interested in, such as the fscache cookie. This allows netfslib functions to be set in filesystem operation tables and jumped to directly without having to have a filesystem wrapper. - Add a member to the struct added above to track the remote inode length as that may differ if local modifications are buffered. We may need to supply an appropriate EOF pointer when storing data (in AFS for example). - Pass extra information to netfs_alloc_request() so that the ->init_request() hook can access it and retain information to indicate the origin of the operation. - Make the ->init_request() hook return an error, thereby allowing a filesystem that isn't allowed to cache an inode (ceph or cifs, for example) to skip readahead. - Switch to using refcount_t for subrequests and add tracepoints to log refcount changes for the request and subrequest structs. - Add a function to consolidate dispatching a read request. Similar code is used in three places and another couple are likely to be added in the future" Link: https://lore.kernel.org/all/2639515.1648483225@warthog.procyon.org.uk/ * tag 'netfs-prep-20220318' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: afs: Maintain netfs_i_context::remote_i_size netfs: Keep track of the actual remote file size netfs: Split some core bits out into their own file netfs: Split fs/netfs/read_helper.c netfs: Rename read_helper.c to io.c netfs: Prepare to split read_helper.c netfs: Add a function to consolidate beginning a read netfs: Add a netfs inode context ceph: Make ceph_init_request() check caps on readahead netfs: Change ->init_request() to return an error code netfs: Refactor arguments for netfs_alloc_read_request netfs: Adjust the netfs_failure tracepoint to indicate non-subreq lines netfs: Trace refcounting on the netfs_io_subrequest struct netfs: Trace refcounting on the netfs_io_request struct netfs: Adjust the netfs_rreq tracepoint slightly netfs: Split netfs_io_* object handling out netfs: Finish off rename of netfs_read_request to netfs_io_request netfs: Rename netfs_read_*request to netfs_io_*request netfs: Generate enums from trace symbol mapping lists fscache: export fscache_end_operation() |
||
Linus Torvalds
|
b8321ed4a4 |
Kbuild updates for v5.18
- Add new environment variables, USERCFLAGS and USERLDFLAGS to allow additional flags to be passed to user-space programs. - Fix missing fflush() bugs in Kconfig and fixdep - Fix a minor bug in the comment format of the .config file - Make kallsyms ignore llvm's local labels, .L* - Fix UAPI compile-test for cross-compiling with Clang - Extend the LLVM= syntax to support LLVM=<suffix> form for using a particular version of LLVm, and LLVM=<prefix> form for using custom LLVM in a particular directory path. - Clean up Makefiles -----BEGIN PGP SIGNATURE----- iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAmJFGloVHG1hc2FoaXJv eUBrZXJuZWwub3JnAAoJED2LAQed4NsGH0kP/j6Vx5BqEv3tP2Q+UANxLqITleJs IFpbSesz/BhlG7I/IapWmCDSqFbYd5uJTO4ko8CsPmZHcxr6Gw3y+DN5yQACKaG/ p9xiF6GjPyKR8+VdcT2tV50+dVY8ANe/DxCyzKrJd/uyYxgARPKJh0KRMNz+d9lj ixUpCXDhx/XlKzPIlcxrvhhjevKz+NnHmN0fe6rzcOw9KzBGBTsf20Q3PqUuBOKa rWHsRGcBPA8eKLfWT1Us1jjic6cT2g4aMpWjF20YgUWKHgWVKcNHpxYKGXASVo/z ewdDnNfmwo7f7fKMCDDro9iwFWV/BumGtn43U00tnqdBcTpFojPlEOga37UPbZDF nmTblGVUhR0vn4PmfBy8WkAkbW+IpVatKwJGV4J3KjSvdWvZOmVj9VUGLVAR0TXW /YcgRs6EtG8Hn0IlCj0fvZ5wRWoDLbP2DSZ67R/44EP0GaNQPwUe4FI1izEE4EYX oVUAIxcKixWGj4RmdtmtMMdUcZzTpbgS9uloMUmS3u9LK0Ir/8tcWaf2zfMO6Jl2 p4Q31s1dUUKCnFnj0xDKRyKGUkxYebrHLfuBqi0RIc0xRpSlxoXe3Dynm9aHEQoD ZSV0eouQJxnaxM1ck5Bu4AHLgEebHfEGjWVyUHno7jFU5EI9Wpbqpe4pCYEEDTm1 +LJMEpdZO0dFvpF+ =84rW -----END PGP SIGNATURE----- Merge tag 'kbuild-v5.18-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild updates from Masahiro Yamada: - Add new environment variables, USERCFLAGS and USERLDFLAGS to allow additional flags to be passed to user-space programs. - Fix missing fflush() bugs in Kconfig and fixdep - Fix a minor bug in the comment format of the .config file - Make kallsyms ignore llvm's local labels, .L* - Fix UAPI compile-test for cross-compiling with Clang - Extend the LLVM= syntax to support LLVM=<suffix> form for using a particular version of LLVm, and LLVM=<prefix> form for using custom LLVM in a particular directory path. - Clean up Makefiles * tag 'kbuild-v5.18-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: kbuild: Make $(LLVM) more flexible kbuild: add --target to correctly cross-compile UAPI headers with Clang fixdep: use fflush() and ferror() to ensure successful write to files arch: syscalls: simplify uapi/kapi directory creation usr/include: replace extra-y with always-y certs: simplify empty certs creation in certs/Makefile certs: include certs/signing_key.x509 unconditionally kallsyms: ignore all local labels prefixed by '.L' kconfig: fix missing '# end of' for empty menu kconfig: add fflush() before ferror() check kbuild: replace $(if A,A,B) with $(or A,B) kbuild: Add environment variables for userprogs flags kbuild: unify cmd_copy and cmd_shipped |
||
Linus Torvalds
|
d888c83fce |
fs: fix fd table size alignment properly
Jason Donenfeld reports that my commit |
||
Linus Torvalds
|
965181d7ef |
NFS client updates for Linux 5.18
Highlights include: Features: - Switch NFS to use readahead instead of the obsolete readpages. - Readdir fixes to improve cacheability of large directories when there are multiple readers and writers. - Readdir performance improvements when doing a seekdir() immediately after opening the directory (common when re-exporting NFS). - NFS swap improvements from Neil Brown. - Loosen up memory allocation to permit direct reclaim and write back in cases where there is no danger of deadlocking the writeback code or NFS swap. - Avoid sillyrename when the NFSv4 server claims to support the necessary features to recover the unlinked but open file after reboot. Bugfixes: - Patch from Olga to add a mount option to control NFSv4.1 session trunking discovery, and default it to being off. - Fix a lockup in nfs_do_recoalesce(). - Two fixes for list iterator variables being used when pointing to the list head. - Fix a kernel memory scribble when reading from a non-socket transport in /sys/kernel/sunrpc. - Fix a race where reconnecting to a server could leave the TCP socket stuck forever in the connecting state. - Patch from Neil to fix a shutdown race which can leave the SUNRPC transport timer primed after we free the struct xprt itself. - Patch from Xin Xiong to fix reference count leaks in the NFSv4.2 copy offload. - Sunrpc patch from Olga to avoid resending a task on an offlined transport. Cleanups: - Patches from Dave Wysochanski to clean up the fscache code -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEESQctxSBg8JpV8KqEZwvnipYKAPIFAmJDLLUACgkQZwvnipYK APJZ5Q/9FH5S38RcUOxpFS+XaVdmxltJTMQv9Bz5mcyljq6PwRjTn73Qs7MfBQfD k90O9Pn/EXgQpFWNW4o/saQGLAO1MbVaesdNNF92Le9NziMz7lr95fsFXOqpg9EQ rEVOImWinv/N0+P0yv5tgjx6Fheu70L4dnwsxepnmtv2R1VjXOVchK7WjsV+9qey Xvn15OzhtVCGhRvvKfmGuGW+2I0D29tRNn2X0Pi+LjxAEQ0We/rcsJeGDMGYtbeC 2U4ADk7KB0kJq04WYCH8k8tTrljL3PUONzO6QP1lltmAZ0FOwXf/AjyaD3SML2Xu A1yc2bEn3/S6wCQ3BiMxoE7UCSewg/ZyQD7B9GjZ/nWYNXREhLKvtVZt+ULTCA6o UKAI2vlEHljzQI2YfbVVZFyL8KCZqi+BFcJ/H/dtFKNazIkiefkaF+703Pflazis HjhA5G6IT0Pk6qZ6OBp+Z/4kCDYZYXXi9ysiNuMPWCiYEfZzt7ocUjm44uRAi+vR Qpf0V01Rg4iplTj3YFXqb9TwxrudzC7AcQ7hbLJia0kD/8djG+aGZs3SJWeO+VuE Xa41oxAx9jqOSRvSKPvj7nbexrwn8PtQI2NtDRqjPXlg6+Jbk23u8lIEVhsCH877 xoMUHtTKLyqECskzRN4nh7qoBD3QfiLuI74aDcPtGEuXzU1+Ox8= =jeGt -----END PGP SIGNATURE----- Merge tag 'nfs-for-5.18-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs Pull NFS client updates from Trond Myklebust: "Highlights include: Features: - Switch NFS to use readahead instead of the obsolete readpages. - Readdir fixes to improve cacheability of large directories when there are multiple readers and writers. - Readdir performance improvements when doing a seekdir() immediately after opening the directory (common when re-exporting NFS). - NFS swap improvements from Neil Brown. - Loosen up memory allocation to permit direct reclaim and write back in cases where there is no danger of deadlocking the writeback code or NFS swap. - Avoid sillyrename when the NFSv4 server claims to support the necessary features to recover the unlinked but open file after reboot. Bugfixes: - Patch from Olga to add a mount option to control NFSv4.1 session trunking discovery, and default it to being off. - Fix a lockup in nfs_do_recoalesce(). - Two fixes for list iterator variables being used when pointing to the list head. - Fix a kernel memory scribble when reading from a non-socket transport in /sys/kernel/sunrpc. - Fix a race where reconnecting to a server could leave the TCP socket stuck forever in the connecting state. - Patch from Neil to fix a shutdown race which can leave the SUNRPC transport timer primed after we free the struct xprt itself. - Patch from Xin Xiong to fix reference count leaks in the NFSv4.2 copy offload. - Sunrpc patch from Olga to avoid resending a task on an offlined transport. Cleanups: - Patches from Dave Wysochanski to clean up the fscache code" * tag 'nfs-for-5.18-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (91 commits) NFSv4/pNFS: Fix another issue with a list iterator pointing to the head NFS: Don't loop forever in nfs_do_recoalesce() SUNRPC: Don't return error values in sysfs read of closed files SUNRPC: Do not dereference non-socket transports in sysfs NFSv4.1: don't retry BIND_CONN_TO_SESSION on session error SUNRPC don't resend a task on an offlined transport NFS: replace usage of found with dedicated list iterator variable SUNRPC: avoid race between mod_timer() and del_timer_sync() pNFS/files: Ensure pNFS allocation modes are consistent with nfsiod pNFS/flexfiles: Ensure pNFS allocation modes are consistent with nfsiod NFSv4/pnfs: Ensure pNFS allocation modes are consistent with nfsiod NFS: Avoid writeback threads getting stuck in mempool_alloc() NFS: nfsiod should not block forever in mempool_alloc() SUNRPC: Make the rpciod and xprtiod slab allocation modes consistent SUNRPC: Fix unx_lookup_cred() allocation NFS: Fix memory allocation in rpc_alloc_task() NFS: Fix memory allocation in rpc_malloc() SUNRPC: Improve accuracy of socket ENOBUFS determination SUNRPC: Replace internal use of SOCKWQ_ASYNC_NOSPACE SUNRPC: Fix socket waits for write buffer space ... |
||
Linus Torvalds
|
1ec48f9551 |
A couple bug fixes
-----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEIodevzQLVs53l6BhNqiEXrVAjGQFAmJDGQUACgkQNqiEXrVA jGSENg/5AT5rY7GwLKUrzGcCrikLVt91iftSKfcEtCuZHQiQn/kdpxGwUWW3GJ6y GvYKyyUumd6r1WKZArhGwZ/4R/pu0sNSWB6C/cxWqr5r3vJ74cVtsyc49HZvy2w5 bBoftcDgJPIDPj2POYFbXR14F9D9MjJbcS6Zie5blWhjwNtxTCfc6AwgV5G5cYbo X4r/RawUGuJsOLqGjro48v5fVW1za6g8h03y3iWzOcGPn7lsgUaIWzC+ZxK167C7 qrc4poWpyVcxtCzA+Fwda1VrPYfQWagZcSp0r0bMe2tNw7rvTJKtWZPiIeDlhIhW 12kbKA/UeJDTMCI4ZA16gUBW0BUvbKRpDYCGtC8OIj64XUblmSOuXHFj71AG7qPI X+Lj/1ZxQ1h31bHw5ay859hr36G5EH2yutH5v7wil9yiuBScj2D/D6HC5z7GVb4O B5PLRJCRW+Hc86vWUg0hutfwqWmNfcvtxs6nqY111EPV7HShBKsSVhyOrHL/mQOo byiFUcYMhbDcrI2dkl8HvELPGMRwn6juw1qL/GlgmCFG7VfR7GW3ilWS8KmjCaTb iUoXeNriab3mUkAH9N1+jC6HbRJQnwj8/Exx66CLecqsLeeAIQryPzvAWE8qEJMY cTzAUZqp67baoCZ/dmY7AuExkURFc++bu+ulgvye63QIyRwwCBw= =ppBo -----END PGP SIGNATURE----- Merge tag 'jfs-5.18' of https://github.com/kleikamp/linux-shaggy Pull jfs updates from Dave Kleikamp: "A couple bug fixes" * tag 'jfs-5.18' of https://github.com/kleikamp/linux-shaggy: jfs: prevent NULL deref in diFree jfs: fix divide error in dbNextAG |
||
Linus Torvalds
|
1c24a18639 |
fs: fd tables have to be multiples of BITS_PER_LONG
This has always been the rule: fdtables have several bitmaps in them,
and as a result they have to be sized properly for bitmaps. We walk
those bitmaps in chunks of 'unsigned long' in serveral cases, but even
when we don't, we use the regular kernel bitops that are defined to work
on arrays of 'unsigned long', not on some byte array.
Now, the distinction between arrays of bytes and 'unsigned long'
normally only really ends up being noticeable on big-endian systems, but
Fedor Pchelkin and Alexey Khoroshilov reported that copy_fd_bitmaps()
could be called with an argument that wasn't even a multiple of
BITS_PER_BYTE. And then it fails to do the proper copy even on
little-endian machines.
The bug wasn't in copy_fd_bitmap(), but in sane_fdtable_size(), which
didn't actually sanitize the fdtable size sufficiently, and never made
sure it had the proper BITS_PER_LONG alignment.
That's partly because the alignment historically came not from having to
explicitly align things, but simply from previous fdtable sizes, and
from count_open_files(), which counts the file descriptors by walking
them one 'unsigned long' word at a time and thus naturally ends up doing
sizing in the proper 'chunks of unsigned long'.
But with the introduction of close_range(), we now have an external
source of "this is how many files we want to have", and so
sane_fdtable_size() needs to do a better job.
This also adds that explicit alignment to alloc_fdtable(), although
there it is mainly just for documentation at a source code level. The
arithmetic we do there to pick a reasonable fdtable size already aligns
the result sufficiently.
In fact,clang notices that the added ALIGN() in that function doesn't
actually do anything, and does not generate any extra code for it.
It turns out that gcc ends up confusing itself by combining a previous
constant-sized shift operation with the variable-sized shift operations
in roundup_pow_of_two(). And probably due to that doesn't notice that
the ALIGN() is a no-op. But that's a (tiny) gcc misfeature that doesn't
matter. Having the explicit alignment makes sense, and would actually
matter on a 128-bit architecture if we ever go there.
This also adds big comments above both functions about how fdtable sizes
have to have that BITS_PER_LONG alignment.
Fixes:
|
||
Linus Torvalds
|
1930a6e739 |
ptrace: Cleanups for v5.18
This set of changes removes tracehook.h, moves modification of all of the ptrace fields inside of siglock to remove races, adds a missing permission check to ptrace.c The removal of tracehook.h is quite significant as it has been a major source of confusion in recent years. Much of that confusion was around task_work and TIF_NOTIFY_SIGNAL (which I have now decoupled making the semantics clearer). For people who don't know tracehook.h is a vestiage of an attempt to implement uprobes like functionality that was never fully merged, and was later superseeded by uprobes when uprobes was merged. For many years now we have been removing what tracehook functionaly a little bit at a time. To the point where now anything left in tracehook.h is some weird strange thing that is difficult to understand. Eric W. Biederman (15): ptrace: Move ptrace_report_syscall into ptrace.h ptrace/arm: Rename tracehook_report_syscall report_syscall ptrace: Create ptrace_report_syscall_{entry,exit} in ptrace.h ptrace: Remove arch_syscall_{enter,exit}_tracehook ptrace: Remove tracehook_signal_handler task_work: Remove unnecessary include from posix_timers.h task_work: Introduce task_work_pending task_work: Call tracehook_notify_signal from get_signal on all architectures task_work: Decouple TIF_NOTIFY_SIGNAL and task_work signal: Move set_notify_signal and clear_notify_signal into sched/signal.h resume_user_mode: Remove #ifdef TIF_NOTIFY_RESUME in set_notify_resume resume_user_mode: Move to resume_user_mode.h tracehook: Remove tracehook.h ptrace: Move setting/clearing ptrace_message into ptrace_stop ptrace: Return the signal to continue with from ptrace_stop Jann Horn (1): ptrace: Check PTRACE_O_SUSPEND_SECCOMP permission on PTRACE_SEIZE Yang Li (1): ptrace: Remove duplicated include in ptrace.c MAINTAINERS | 1 - arch/Kconfig | 5 +- arch/alpha/kernel/ptrace.c | 5 +- arch/alpha/kernel/signal.c | 4 +- arch/arc/kernel/ptrace.c | 5 +- arch/arc/kernel/signal.c | 4 +- arch/arm/kernel/ptrace.c | 12 +- arch/arm/kernel/signal.c | 4 +- arch/arm64/kernel/ptrace.c | 14 +-- arch/arm64/kernel/signal.c | 4 +- arch/csky/kernel/ptrace.c | 5 +- arch/csky/kernel/signal.c | 4 +- arch/h8300/kernel/ptrace.c | 5 +- arch/h8300/kernel/signal.c | 4 +- arch/hexagon/kernel/process.c | 4 +- arch/hexagon/kernel/signal.c | 1 - arch/hexagon/kernel/traps.c | 6 +- arch/ia64/kernel/process.c | 4 +- arch/ia64/kernel/ptrace.c | 6 +- arch/ia64/kernel/signal.c | 1 - arch/m68k/kernel/ptrace.c | 5 +- arch/m68k/kernel/signal.c | 4 +- arch/microblaze/kernel/ptrace.c | 5 +- arch/microblaze/kernel/signal.c | 4 +- arch/mips/kernel/ptrace.c | 5 +- arch/mips/kernel/signal.c | 4 +- arch/nds32/include/asm/syscall.h | 2 +- arch/nds32/kernel/ptrace.c | 5 +- arch/nds32/kernel/signal.c | 4 +- arch/nios2/kernel/ptrace.c | 5 +- arch/nios2/kernel/signal.c | 4 +- arch/openrisc/kernel/ptrace.c | 5 +- arch/openrisc/kernel/signal.c | 4 +- arch/parisc/kernel/ptrace.c | 7 +- arch/parisc/kernel/signal.c | 4 +- arch/powerpc/kernel/ptrace/ptrace.c | 8 +- arch/powerpc/kernel/signal.c | 4 +- arch/riscv/kernel/ptrace.c | 5 +- arch/riscv/kernel/signal.c | 4 +- arch/s390/include/asm/entry-common.h | 1 - arch/s390/kernel/ptrace.c | 1 - arch/s390/kernel/signal.c | 5 +- arch/sh/kernel/ptrace_32.c | 5 +- arch/sh/kernel/signal_32.c | 4 +- arch/sparc/kernel/ptrace_32.c | 5 +- arch/sparc/kernel/ptrace_64.c | 5 +- arch/sparc/kernel/signal32.c | 1 - arch/sparc/kernel/signal_32.c | 4 +- arch/sparc/kernel/signal_64.c | 4 +- arch/um/kernel/process.c | 4 +- arch/um/kernel/ptrace.c | 5 +- arch/x86/kernel/ptrace.c | 1 - arch/x86/kernel/signal.c | 5 +- arch/x86/mm/tlb.c | 1 + arch/xtensa/kernel/ptrace.c | 5 +- arch/xtensa/kernel/signal.c | 4 +- block/blk-cgroup.c | 2 +- fs/coredump.c | 1 - fs/exec.c | 1 - fs/io-wq.c | 6 +- fs/io_uring.c | 11 +- fs/proc/array.c | 1 - fs/proc/base.c | 1 - include/asm-generic/syscall.h | 2 +- include/linux/entry-common.h | 47 +------- include/linux/entry-kvm.h | 2 +- include/linux/posix-timers.h | 1 - include/linux/ptrace.h | 81 ++++++++++++- include/linux/resume_user_mode.h | 64 ++++++++++ include/linux/sched/signal.h | 17 +++ include/linux/task_work.h | 5 + include/linux/tracehook.h | 226 ----------------------------------- include/uapi/linux/ptrace.h | 2 +- kernel/entry/common.c | 19 +-- kernel/entry/kvm.c | 9 +- kernel/exit.c | 3 +- kernel/livepatch/transition.c | 1 - kernel/ptrace.c | 47 +++++--- kernel/seccomp.c | 1 - kernel/signal.c | 62 +++++----- kernel/task_work.c | 4 +- kernel/time/posix-cpu-timers.c | 1 + mm/memcontrol.c | 2 +- security/apparmor/domain.c | 1 - security/selinux/hooks.c | 1 - 85 files changed, 372 insertions(+), 495 deletions(-) Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEgjlraLDcwBA2B+6cC/v6Eiajj0AFAmJCQkoACgkQC/v6Eiaj j0DCWQ/5AZVFU+hX32obUNCLackHTwgcCtSOs3JNBmNA/zL/htPiYYG0ghkvtlDR Dw5J5DnxC6P7PVAdAqrpvx2uX2FebHYU0bRlyLx8LYUEP5dhyNicxX9jA882Z+vw Ud0Ue9EojwGWS76dC9YoKUj3slThMATbhA2r4GVEoof8fSNJaBxQIqath44t0FwU DinWa+tIOvZANGBZr6CUUINNIgqBIZCH/R4h6ArBhMlJpuQ5Ufk2kAaiWFwZCkX4 0LuuAwbKsCKkF8eap5I2KrIg/7zZVgxAg9O3cHOzzm8OPbKzRnNnQClcDe8perqp S6e/f3MgpE+eavd1EiLxevZ660cJChnmikXVVh8ZYYoefaMKGqBaBSsB38bNcLjY 3+f2dB+TNBFRnZs1aCujK3tWBT9QyjZDKtCBfzxDNWBpXGLhHH6j6lA5Lj+Cef5K /HNHFb+FuqedlFZh5m1Y+piFQ70hTgCa2u8b+FSOubI2hW9Zd+WzINV0ANaZ2LvZ 4YGtcyDNk1q1+c87lxP9xMRl/xi6rNg+B9T2MCo4IUnHgpSVP6VEB3osgUmrrrN0 eQlUI154G/AaDlqXLgmn1xhRmlPGfmenkxpok1AuzxvNJsfLKnpEwQSc13g3oiZr disZQxNY0kBO2Nv3G323Z6PLinhbiIIFez6cJzK5v0YJ2WtO3pY= =uEro -----END PGP SIGNATURE----- Merge tag 'ptrace-cleanups-for-v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull ptrace cleanups from Eric Biederman: "This set of changes removes tracehook.h, moves modification of all of the ptrace fields inside of siglock to remove races, adds a missing permission check to ptrace.c The removal of tracehook.h is quite significant as it has been a major source of confusion in recent years. Much of that confusion was around task_work and TIF_NOTIFY_SIGNAL (which I have now decoupled making the semantics clearer). For people who don't know tracehook.h is a vestiage of an attempt to implement uprobes like functionality that was never fully merged, and was later superseeded by uprobes when uprobes was merged. For many years now we have been removing what tracehook functionaly a little bit at a time. To the point where anything left in tracehook.h was some weird strange thing that was difficult to understand" * tag 'ptrace-cleanups-for-v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: ptrace: Remove duplicated include in ptrace.c ptrace: Check PTRACE_O_SUSPEND_SECCOMP permission on PTRACE_SEIZE ptrace: Return the signal to continue with from ptrace_stop ptrace: Move setting/clearing ptrace_message into ptrace_stop tracehook: Remove tracehook.h resume_user_mode: Move to resume_user_mode.h resume_user_mode: Remove #ifdef TIF_NOTIFY_RESUME in set_notify_resume signal: Move set_notify_signal and clear_notify_signal into sched/signal.h task_work: Decouple TIF_NOTIFY_SIGNAL and task_work task_work: Call tracehook_notify_signal from get_signal on all architectures task_work: Introduce task_work_pending task_work: Remove unnecessary include from posix_timers.h ptrace: Remove tracehook_signal_handler ptrace: Remove arch_syscall_{enter,exit}_tracehook ptrace: Create ptrace_report_syscall_{entry,exit} in ptrace.h ptrace/arm: Rename tracehook_report_syscall report_syscall ptrace: Move ptrace_report_syscall into ptrace.h |
||
Linus Torvalds
|
266d17a8c0 |
Driver core changes for 5.18-rc1
Here is the set of driver core changes for 5.18-rc1. Not much here, primarily it was a bunch of cleanups and small updates: - kobj_type cleanups for default_groups - documentation updates - firmware loader minor changes - component common helper added and take advantage of it in many drivers (the largest part of this pull request). There will be a merge conflict in drivers/power/supply/ab8500_chargalg.c with your tree, the merge conflict should be easy (take all the changes). All of these have been in linux-next for a while with no reported problems. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCYkG6PA8cZ3JlZ0Brcm9h aC5jb20ACgkQMUfUDdst+ylMFwCfSIyAU4oLEgj+/Rfmx4o45cAVIWMAnit3zbdU wUUCGqKcOnTJEcW6dMPh =1VVi -----END PGP SIGNATURE----- Merge tag 'driver-core-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here is the set of driver core changes for 5.18-rc1. Not much here, primarily it was a bunch of cleanups and small updates: - kobj_type cleanups for default_groups - documentation updates - firmware loader minor changes - component common helper added and take advantage of it in many drivers (the largest part of this pull request). All of these have been in linux-next for a while with no reported problems" * tag 'driver-core-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (54 commits) Documentation: update stable review cycle documentation drivers/base/dd.c : Remove the initial value of the global variable Documentation: update stable tree link Documentation: add link to stable release candidate tree devres: fix typos in comments Documentation: add note block surrounding security patch note samples/kobject: Use sysfs_emit instead of sprintf base: soc: Make soc_device_match() simpler and easier to read driver core: dd: fix return value of __setup handler driver core: Refactor sysfs and drv/bus remove hooks driver core: Refactor multiple copies of device cleanup scripts: get_abi.pl: Fix typo in help message kernfs: fix typos in comments kernfs: remove unneeded #if 0 guard ALSA: hda/realtek: Make use of the helper component_compare_dev_name video: omapfb: dss: Make use of the helper component_compare_dev power: supply: ab8500: Make use of the helper component_compare_dev ASoC: codecs: wcd938x: Make use of the helper component_compare/release_of iommu/mediatek: Make use of the helper component_compare/release_of drm: of: Make use of the helper component_release_of ... |
||
Trond Myklebust
|
7c9d845f06 |
NFSv4/pNFS: Fix another issue with a list iterator pointing to the head
In nfs4_callback_devicenotify(), if we don't find a matching entry for
the deviceid, we're left with a pointer to 'struct nfs_server' that
actually points to the list of super blocks associated with our struct
nfs_client.
Furthermore, even if we have a valid pointer, nothing pins the super
block, and so the struct nfs_server could end up getting freed while
we're using it.
Since all we want is a pointer to the struct pnfs_layoutdriver_type,
let's skip all the iteration over super blocks, and just use APIs to
find the layout driver directly.
Reported-by: Xiaomeng Tong <xiam0nd.tong@gmail.com>
Fixes:
|
||
Linus Torvalds
|
7001052160 |
Add support for Intel CET-IBT, available since Tigerlake (11th gen), which is a
coarse grained, hardware based, forward edge Control-Flow-Integrity mechanism where any indirect CALL/JMP must target an ENDBR instruction or suffer #CP. Additionally, since Alderlake (12th gen)/Sapphire-Rapids, speculation is limited to 2 instructions (and typically fewer) on branch targets not starting with ENDBR. CET-IBT also limits speculation of the next sequential instruction after the indirect CALL/JMP [1]. CET-IBT is fundamentally incompatible with retpolines, but provides, as described above, speculation limits itself. [1] https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/branch-history-injection.html -----BEGIN PGP SIGNATURE----- iQJJBAABCgAzFiEEv3OU3/byMaA0LqWJdkfhpEvA5LoFAmI/LI8VHHBldGVyekBp bmZyYWRlYWQub3JnAAoJEHZH4aRLwOS6ZnkP/2QCgQLTu6oRxv9O020CHwlaSEeD 1Hoy3loum5q5hAi1Ik3dR9p0H5u64c9qbrBVxaFoNKaLt5GKrtHaDSHNk2L/CFHX urpH65uvTLxbyZzcahkAahoJ71XU+m7PcrHLWMunw9sy10rExYVsUOlFyoyG6XCF BDCNZpdkC09ZM3vwlWGMZd5Pp+6HcZNPyoV9tpvWAS2l+WYFWAID7mflbpQ+tA8b y/hM6b3Ud0rT2ubuG1iUpopgNdwqQZ+HisMPGprh+wKZkYwS2l8pUTrz0MaBkFde go7fW16kFy2HQzGm6aIEBmfcg0palP/mFVaWP0zS62LwhJSWTn5G6xWBr3yxSsht 9gWCiI0oDZuTg698MedWmomdG2SK6yAuZuqmdKtLLoWfWgviPEi7TDFG/cKtZdAW ag8GM8T4iyYZzpCEcWO9GWbjo6TTGq30JBQefCBG47GjD0csv2ubXXx0Iey+jOwT x3E8wnv9dl8V9FSd/tMpTFmje8ges23yGrWtNpb5BRBuWTeuGiBPZED2BNyyIf+T dmewi2ufNMONgyNp27bDKopY81CPAQq9cVxqNm9Cg3eWPFnpOq2KGYEvisZ/rpEL EjMQeUBsy/C3AUFAleu1vwNnkwP/7JfKYpN00gnSyeQNZpqwxXBCKnHNgOMTXyJz beB/7u2KIUbKEkSN =jZfK -----END PGP SIGNATURE----- Merge tag 'x86_core_for_5.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 CET-IBT (Control-Flow-Integrity) support from Peter Zijlstra: "Add support for Intel CET-IBT, available since Tigerlake (11th gen), which is a coarse grained, hardware based, forward edge Control-Flow-Integrity mechanism where any indirect CALL/JMP must target an ENDBR instruction or suffer #CP. Additionally, since Alderlake (12th gen)/Sapphire-Rapids, speculation is limited to 2 instructions (and typically fewer) on branch targets not starting with ENDBR. CET-IBT also limits speculation of the next sequential instruction after the indirect CALL/JMP [1]. CET-IBT is fundamentally incompatible with retpolines, but provides, as described above, speculation limits itself" [1] https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/branch-history-injection.html * tag 'x86_core_for_5.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (53 commits) kvm/emulate: Fix SETcc emulation for ENDBR x86/Kconfig: Only allow CONFIG_X86_KERNEL_IBT with ld.lld >= 14.0.0 x86/Kconfig: Only enable CONFIG_CC_HAS_IBT for clang >= 14.0.0 kbuild: Fixup the IBT kbuild changes x86/Kconfig: Do not allow CONFIG_X86_X32_ABI=y with llvm-objcopy x86: Remove toolchain check for X32 ABI capability x86/alternative: Use .ibt_endbr_seal to seal indirect calls objtool: Find unused ENDBR instructions objtool: Validate IBT assumptions objtool: Add IBT/ENDBR decoding objtool: Read the NOENDBR annotation x86: Annotate idtentry_df() x86,objtool: Move the ASM_REACHABLE annotation to objtool.h x86: Annotate call_on_stack() objtool: Rework ASM_REACHABLE x86: Mark __invalid_creds() __noreturn exit: Mark do_group_exit() __noreturn x86: Mark stop_this_cpu() __noreturn objtool: Ignore extra-symbol code objtool: Rename --duplicate to --lto ... |
||
Linus Torvalds
|
a060c9409e |
Fix buffered write page prefaulting
-----BEGIN PGP SIGNATURE----- iQJIBAABCAAyFiEEJZs3krPW0xkhLMTc1b+f6wMTZToFAmI90BMUHGFncnVlbmJh QHJlZGhhdC5jb20ACgkQ1b+f6wMTZTpdexAApvfewSSFZ4J4mGqrGJcn1+iTkX1t ruDuDKOoRXbk5I+RCJKDJireIvdzElNE4f4rvgSk0Z+JMBaYcq4wI6tkuVqCs7uo ozrnY+Qfh2a9DA5+ghnNIdaow4v6FSrV6vvYA0ibf5D75r034j5PLnu9Ktj+rg6S 9dFe5lTHNaBcEg9VMChT2RU9ysmdlnFL155J2iJUWCnhDcCUW15YZeRRa2ECZTcZ FHrgNsoFuAzsbAqV58bwirEumbZbWjfCU0blrAu00rKuaWgba0fTXegWAqHBrpii 4hg3MJmttpg/hAUdxJFVCeE6Q/XR9nPkajIzmLxzw06Sb+YJMYpR64mno5vcQc64 eDFGmeG75XHEfjfzVb2rG1oTUy8x3E5TzsmGp+FTs9fxhwkFSbPhx2QReX/m1nvP 1X3JFrORTg3+lE2HBANftxWPMqNJmiuKEbQQvC8y0lgXMA3yGilnwG9eXYandKU1 AEXK8B0mOPvE+DSnmGYRmw/vMXFarZ7Ua3dFPFgWic7ai8J5UsPCgv/I1cB4Ku/X ZzOR8hbgVxDzDKC4nby3g4FSB5olRkFRYw+M+oinfJ3WfIoL0N11NxAI8OpXcVl2 Nr8/KnhK60CjgjnMAF871AS/qJnyjxYuPj+vDq46VoRQnFKoj0O/rLeokpEMGUET sitD2bxtoGrlpq8= =5EUY -----END PGP SIGNATURE----- Merge tag 'write-page-prefaulting' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull iomap fixlet from Andreas Gruenbacher: "Fix buffered write page prefaulting. I forgot to send it the previous merge window. I've only improved the patch description since" * tag 'write-page-prefaulting' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: fs/iomap: Fix buffered write page prefaulting |
||
Linus Torvalds
|
752d422e74 |
for-5.18/alloc-cleanups-2022-03-25
-----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmI+UKAQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpkf8D/0WpfCaiaS3U269b3rX/PyM7LuQUAD9xMQI GUXX17VQvMPFvbDUgwWdkb+XqKvmJtPDPdpGqwzidWgzwjpJq/KsZkWILG8XpjKh ceMJ8qv6fhINAoyL+uB2Wne6YtLB0E9kqSGy72dv95TntUhajw9T7sqHGLN+HKo3 j70AzFMTwighMMbZqAUoP5coreHVrz4TApUKocQVu2bWVvvMkBj94tEWlVoGblsE 67bPRhCfvun6iO94uZErh4W9IeauyhKYQVBre+93loUerhEDWvLTOAZYftW1uPfG TheVSrYoJ3l+KcRxa1YSloAhmb3OX/YOYaEm4lr+sLg9yl6/aY6GvnfzwjL9o3kf mGp969Ro06e1DMgmyCfokar0jNVAqYMZN69eXONe2haU/J+RwX9qvkMs2EaPtwkO 7VQQY8x/FOrjobsYnNQU70sm8i7zu7piEbs9iTwILlFSzN85sl8aNlnOQYJWNzvq 7BUwry0eH2H6yLbLR+F9tw3IIbo/V/BWNf6ZqBojwIb2sDwP1jghPKWKyTBKjBfF TwjSLzMmwDaa27gt+zAizEJwmx9FGCMy4O527cnwF/+9nGJ6l8hZQtTXttgyPY87 o00/c5tgvJ15JQdsTGLeN6+nzlxylF6mVTWwUWW96E08YsARcXNbDZYf3UVXgoH/ YW7uBHaByQ== =k4vt -----END PGP SIGNATURE----- Merge tag 'for-5.18/alloc-cleanups-2022-03-25' of git://git.kernel.dk/linux-block Pull bio allocation fix from Jens Axboe: "We got some reports of users seeing: Unexpected gfp: 0x2 (__GFP_HIGHMEM). Fixing up to gfp: 0x1192888 which is a regression caused by the bio allocation cleanups" * tag 'for-5.18/alloc-cleanups-2022-03-25' of git://git.kernel.dk/linux-block: fs: do not pass __GFP_HIGHMEM to bio_alloc in do_mpage_readpage |
||
Linus Torvalds
|
561593a048 |
for-5.18/write-streams-2022-03-18
-----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmI1AHwQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgplPjEACVJzKg5NkxpdkDThvq5tejws9KxB/4mHJg NoDMcv1TF+Orsd/HNW6XrgYnbU0ObHom3568/xb8BNegRVFe7V4ME/4IYNRyGOmV qbfciu04L1UkJhI52CIidkOioFABL3r1zgLCIz5vk0Cv9X7Le9x0UabHxJf7u9S+ Z3lNdyxezN0SGx8VT86l/7lSoHtG3VHO9IsQCuNGF02SB+6uGpXBlptbEoQ4nTxd T7/H9FNOe2Wf7eKvcOOds8UlvZYAfYcY0GcRrIOXdHIy25mKFWwn5cDgFTMOH5ID xXpm+JFkDkrfSW1o4FFPxbN9Z6RbVXbGCsrXlIragLO2MJQdXiIUxS1OPT5oAado H9MlX6QtkwziLW9zUWa/N/jmRjc2vzHAxD6JFg/wXxNdtY0kd8TQpaxwTB8mVDPN VCGutt7lJS1CQInQ+ppzbdqzzuLHC1RHAyWSmfUE9rb8cbjxtJBnSIorYRLUesMT GRwqVTXW0osxSgCb1iDiBCJANrX1yPZcemv4Wh1gzbT6IE9sWxWXsE5sy9KvswNc M+E4nu/TYYTfkynItJjLgmDLOoi+V0FBY6ba0mRPBjkriSP4AVlwsZLGVsAHQzuA o5paW1GjRCCwhIQ6+AzZIoOz6wqvprBlUgUkUneyYAQ2ZKC3pZi8zPnpoVdFucVa VaTzP71C1Q== =efaq -----END PGP SIGNATURE----- Merge tag 'for-5.18/write-streams-2022-03-18' of git://git.kernel.dk/linux-block Pull NVMe write streams removal from Jens Axboe: "This removes the write streams support in NVMe. No vendor ever really shipped working support for this, and they are not interested in supporting it. With the NVMe support gone, we have nothing in the tree that supports this. Remove passing around of the hints. The only discussion point in this patchset imho is the fact that the file specific write hint setting/getting fcntl helpers will now return -1/EINVAL like they did before we supported write hints. No known applications use these functions, I only know of one prototype that I help do for RocksDB, and that's not used. That said, with a change like this, it's always a bit controversial. Alternatively, we could just make them return 0 and pretend it worked. It's placement based hints after all" * tag 'for-5.18/write-streams-2022-03-18' of git://git.kernel.dk/linux-block: fs: remove fs.f_write_hint fs: remove kiocb.ki_hint block: remove the per-bio/request write hint nvme: remove support or stream based temperature hint |
||
Trond Myklebust
|
d02d81efc7 |
NFS: Don't loop forever in nfs_do_recoalesce()
If __nfs_pageio_add_request() fails to add the request, it will return
with either desc->pg_error < 0, or mirror->pg_recoalesce will be set, so
we are guaranteed either to exit the function altogether, or to loop.
However if there is nothing left in mirror->pg_list to coalesce, we must
exit, so make sure that we clear mirror->pg_recoalesce every time we
loop.
Reported-by: Olga Kornievskaia <aglo@umich.edu>
Fixes:
|
||
Linus Torvalds
|
a452c4eb40 |
\n
-----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAmI7PZQACgkQnJ2qBz9k QNnW+QgA6c0mVip5ESE+RgRMLnajAFs0kmOzGEXpoa+QXvxrU3GuY5ssVAM4MlSx yzCnhlhHua3tci7rhTQ0pPrxBXStIxf/EqKB8y4ylwZhZAP3XdStvTsBizt1966m 1GyQiAHQFLsIsbbXfXcAVClYBHSkD8zAElFLvVB08q8zFXMpkC+2oh/gpwAlOPYz uczYatJV4edx57E3yX2lQJfDZK8I3OeDKeladDzCliTim8zR8sZTfZU5seiM7rHU 2XexWsM8tvUn33pvw6JE0GwOhp0ILBgMWxtA3Z1OVW+6+sngIQ4NsRRq/MaRy/ht zfOnNtPVyIf7DTgla0mqqPNM7/qZmQ== =AjOm -----END PGP SIGNATURE----- Merge tag 'fs_for_v5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull reiserfs updates from Jan Kara: "The biggest change in this pull is the addition of a deprecation message about reiserfs with the outlook that we'd eventually be able to remove it from the kernel. Because it is practically unmaintained and untested and odd enough that people don't want to bother with it anymore... Otherwise there are small udf and ext2 fixes" * tag 'fs_for_v5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: udf: remove redundant assignment of variable etype reiserfs: Deprecate reiserfs ext2: correct max file size computing reiserfs: get rid of AOP_FLAG_CONT_EXPAND flag |
||
Linus Torvalds
|
a8988507e5 |
\n
-----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAmI7PPUACgkQnJ2qBz9k QNm1ygf/WcyxdM4+FTrhVVFGKPriqM8Ftx2r88/sod8CPUZrmCtLbCtO4rVBAEip c1eVRFEcZV+A/pt/UVvIklb+/Vlc1RmP1pEBzOcDaDvOMcnBoV43rTHxwxrX1YdX Ykr6C37d+51FQu/cuVC/LHx2+MhIkAe2ebHbBZNVpk+s98iY9xjo+0cRaDbzgkXq NqySxDghZnRO4wvYbYYrStJ/XLJePjrBU1n18T0m7wpJ5UuyAnXlVwTZgkVhdNLy EnBUIRvhXxr3WzfH6I6G7fOHTSbYuho9CSK3K49TdiqAtoxrlWgX2LWE5WRzhHWd nlTu64JWeSMTIFCxxinxSs7F9nUiDg== =6VY+ -----END PGP SIGNATURE----- Merge tag 'fsnotify_for_v5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull fsnotify updates from Jan Kara: "A few fsnotify improvements and cleanups" * tag 'fsnotify_for_v5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: fsnotify: remove redundant parameter judgment fsnotify: optimize FS_MODIFY events with no ignored masks fsnotify: fix merge with parent's ignored mask |
||
Linus Torvalds
|
50560ce6a0 |
Kbuild -std=gnu11 updates for v5.18
Linus pointed out the benefits of C99 some years ago, especially variable declarations in loops [1]. At that time, we were not ready for the migration due to old compilers. Recently, Jakob Koschel reported a bug in list_for_each_entry(), which leaks the invalid pointer out of the loop [2]. In the discussion, we agreed that the time had come. Now that GCC 5.1 is the minimum compiler version, there is nothing to prevent us from going to -std=gnu99, or even straight to -std=gnu11. Discussions for a better list iterator implementation are ongoing, but this patch set must land first. [1] https://lore.kernel.org/all/CAHk-=wgr12JkKmRd21qh-se-_Gs69kbPgR9x4C+Es-yJV2GLkA@mail.gmail.com/ [2] https://lore.kernel.org/lkml/86C4CE7D-6D93-456B-AA82-F8ADEACA40B7@gmail.com/ -----BEGIN PGP SIGNATURE----- iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAmI9JqMVHG1hc2FoaXJv eUBrZXJuZWwub3JnAAoJED2LAQed4NsG3dkP/Ar7r8m4hc60kJE8JfXaxDpGOGka 2yVm0EPfwV1lFGq440p4mqKc1iRTVLNMPsyaG/ZhriIp8PlSUjXLW290Sty6Z8Pd zcxwDg09ZXkMoDX+lc2Wr9F0wpswWJjqU/TzGLP5/qkVMe46KheXIQSPJAp8tVUt u2of/MTgTVMa4r7Iex/+NFWCPr4lTkWkSfzVN/Jd1r91UOyzy4E1VFRNlXIk/Fc9 BFa67k0SHx/3FFElfwzFaejYUZjHjNzK3E1Zq8Q1vkWUxrzeEnzqTEiP7QaAi4Sa 7MbqyqQvNoPw3uvKu5kwjDE+LHMEPTsmuaKVFpAc+qCpMtZCI6g9Q48pzQsWBMO2 hZlEmYR9Zk0TpJp1flpOnNzoy7xPzNs0rcB3PaSOZyv+dTqtJ981IP+r4RNVlwje y3N9vq4RSAj/kAE/wi6FiPc/8vfbY71PbEXmg8556+kn3ne6aXl13ZrXIxz8w5jK bIgIFmrEPH7941KvFjoXhaFp/qv9hvLpWhQZu7CFRaj5V28qqUQ5TQFJREPePRtJ RFPEuOJqEGMxW/xbhcfrA1AO/y9Grxbe65e8Mph4YCfWpWaUww6vN01LC+k6UgDm Yq2u+wSFjWpRxOEPLWNsjnrZZgfdjk22O+TNOMs92X8/gXinmu3kZG5IUavahg7+ J0SsIjIXhmLGKdDm =KMDk -----END PGP SIGNATURE----- Merge tag 'kbuild-gnu11-v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild update for C11 language base from Masahiro Yamada: "Kbuild -std=gnu11 updates for v5.18 Linus pointed out the benefits of C99 some years ago, especially variable declarations in loops [1]. At that time, we were not ready for the migration due to old compilers. Recently, Jakob Koschel reported a bug in list_for_each_entry(), which leaks the invalid pointer out of the loop [2]. In the discussion, we agreed that the time had come. Now that GCC 5.1 is the minimum compiler version, there is nothing to prevent us from going to -std=gnu99, or even straight to -std=gnu11. Discussions for a better list iterator implementation are ongoing, but this patch set must land first" [1] https://lore.kernel.org/all/CAHk-=wgr12JkKmRd21qh-se-_Gs69kbPgR9x4C+Es-yJV2GLkA@mail.gmail.com/ [2] https://lore.kernel.org/lkml/86C4CE7D-6D93-456B-AA82-F8ADEACA40B7@gmail.com/ * tag 'kbuild-gnu11-v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: Kbuild: use -std=gnu11 for KBUILD_USERCFLAGS Kbuild: move to -std=gnu11 Kbuild: use -Wdeclaration-after-statement Kbuild: add -Wno-shift-negative-value where -Wextra is used |
||
Andreas Gruenbacher
|
631f871f07 |
fs/iomap: Fix buffered write page prefaulting
When part of the user buffer passed to generic_perform_write() or
iomap_file_buffered_write() cannot be faulted in for reading, the entire
write currently fails. The correct behavior would be to write all the
data that can be written, up to the point of failure.
Commit
|
||
Linus Torvalds
|
85c7000fda |
The highlights are:
- several changes to how snap context and snap realms are tracked (Xiubo Li). In particular, this should resolve a long-standing issue of high kworker CPU usage and various stalls caused by needless iteration over all inodes in the snap realm. - async create fixes to address hangs in some edge cases (Jeff Layton) - support for getvxattr MDS op for querying server-side xattrs, such as file/directory layouts and ephemeral pins (Milind Changire) - average latency is now maintained for all metrics (Venky Shankar) - some tweaks around handling inline data to make it fit better with netfs helper library (David Howells) Also a couple of memory leaks got plugged along with a few assorted fixups. Last but not least, Xiubo has stepped up to serve as a CephFS co-maintainer. -----BEGIN PGP SIGNATURE----- iQFHBAABCAAxFiEEydHwtzie9C7TfviiSn/eOAIR84sFAmI8qE4THGlkcnlvbW92 QGdtYWlsLmNvbQAKCRBKf944AhHzi9UvB/kBZt4mkyjqRC+KJ5rukw5q7lyAYGC1 QYVbuTuIJydyDvqQp9pXYFndlj10pb7ULnUlQlcBfntVAr9s7xx7ZKrKciE48MPT vLiJmq3MpEedM4oE4FgcJbmHtltDgZWvOxXB7renpHNeHuPeezNpKzaKQXGUHUDo +7cX5XWBzZk+AYbEvxQUsjDozcgDp31qf015mAX3r0P7XFkBB7xwZA7sb7Cw1GEr S6ZdlNoFcWUq0ULUdh2C7l5a2mKQnVnpOO3TMjE6tSqJ74iozRy4tO9aFgj99NEn D1rQbCLr3JPfY//JFyqEOIDYf3hepOMmEkoGHOFukckFKUe3yfJJxa3r =ESr5 -----END PGP SIGNATURE----- Merge tag 'ceph-for-5.18-rc1' of https://github.com/ceph/ceph-client Pull ceph updates from Ilya Dryomov: "The highlights are: - several changes to how snap context and snap realms are tracked (Xiubo Li). In particular, this should resolve a long-standing issue of high kworker CPU usage and various stalls caused by needless iteration over all inodes in the snap realm. - async create fixes to address hangs in some edge cases (Jeff Layton) - support for getvxattr MDS op for querying server-side xattrs, such as file/directory layouts and ephemeral pins (Milind Changire) - average latency is now maintained for all metrics (Venky Shankar) - some tweaks around handling inline data to make it fit better with netfs helper library (David Howells) Also a couple of memory leaks got plugged along with a few assorted fixups. Last but not least, Xiubo has stepped up to serve as a CephFS co-maintainer" * tag 'ceph-for-5.18-rc1' of https://github.com/ceph/ceph-client: (27 commits) ceph: fix memory leak in ceph_readdir when note_last_dentry returns error ceph: uninitialized variable in debug output ceph: use tracked average r/w/m latencies to display metrics in debugfs ceph: include average/stdev r/w/m latency in mds metrics ceph: track average r/w/m latency ceph: use ktime_to_timespec64() rather than jiffies_to_timespec64() ceph: assign the ci only when the inode isn't NULL ceph: fix inode reference leakage in ceph_get_snapdir() ceph: misc fix for code style and logs ceph: allocate capsnap memory outside of ceph_queue_cap_snap() ceph: do not release the global snaprealm until unmounting ceph: remove incorrect and unused CEPH_INO_DOTDOT macro MAINTAINERS: add Xiubo Li as cephfs co-maintainer ceph: eliminate the recursion when rebuilding the snap context ceph: do not update snapshot context when there is no new snapshot ceph: zero the dir_entries memory when allocating it ceph: move to a dedicated slabcache for ceph_cap_snap ceph: add getvxattr op libceph: drop else branches in prepare_read_data{,_cont} ceph: fix comments mentioning i_mutex ... |
||
Linus Torvalds
|
b1b07ba356 |
New code for 5.18:
- Fix some incorrect mapping state being passed to iomap during COW - Don't create bogus selinux audit messages when deciding to degrade gracefully due to lack of privilege - Fix setattr implementation to use VFS helpers so that we drop setgid consistently with the other filesystems - Fix link/unlink/rename to check quota limits - Constify xfs_name_dotdot to prevent abuse of in-kernel symbols - Fix log livelock between the AIL and inodegc threads during recovery - Fix a log stall when the AIL races with pushers - Fix stalls in CIL flushes due to pinned inode cluster buffers during recovery - Fix log corruption due to incorrect usage of xfs_is_shutdown vs xlog_is_shutdown because during an induced fs shutdown, AIL writeback must continue until the log is shut down, even if the filesystem has already shut down -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAmI3UQ8ACgkQ+H93GTRK tOttpBAAjz05QkClIYKlHREiWt4a2a0FnRBvkz2fvZb0Nk3kST/drJ14VAb1Q3g1 HvvOGwssyJShkbd6DBGTqrp9AzZ0mfhSYPpARtCNwOvUNV4tXwLdiBZ86H4/l3qy seS1tXW4pUKm6xhN9fnMw1sk4oJmPXq67JW+1h35oDaE6tpdd9lehFMpMxaTMjED 4WsU0UwKCWr4l1lKP1EXvh+ENJeo0QZpccNHV3ChLnWx0mPMRpf4tQP2A01oBPkh oXHSms0TV7FHDIv+Zil3kBgkfh266WhcPdZ4pfIqyQc51LzbgrxEqcrZHGI5XJjB zcQd8RkzOI68XIDchijOorOkQZmDEDIGlCgSY6q/JV4N2iFyF84hHedk2le5j97l Jmout2Xng3bgstl4963IjXk8SvPTebnI76a62XoEcHnf4KBROLKIU7wNCh5LXNvk CI6AWJGy6EAGEc3BHyPFZxZF72D9rUmRbPIJBc7rBhJgMoIy4L4sFlvVtyppbERb gMH9PNTLjY5/abQkV044iLOsEDh5FEPBIehoaENNo252vj2R/WsAAQL13l0cMsrN vAryGdAEelZEyg62k+HT4W87zjH0Kgtgli6zMdx7akYdjKbMOIZALEu61iBH7KT+ heAz8pU/krTy+Q583XU13eCWnY1wPrHRMwdGF+i4WUGiKuJSoYg= =uHT0 -----END PGP SIGNATURE----- Merge tag 'xfs-5.18-merge-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux Pull xfs updates from Darrick Wong: "The biggest change this cycle is bringing XFS' inode attribute setting code back towards alignment with what the VFS does. IOWs, setgid bit handling should be a closer match with ext4 and btrfs behavior. The rest of the branch is bug fixes around the filesystem -- patching gaps in quota enforcement, removing bogus selinux audit messages, and fixing log corruption and problems with log recovery. There will be a second pull request later on in the merge window with more bug fixes. Dave Chinner will be taking over as XFS maintainer for one release cycle, starting from the day 5.18-rc1 drops until 5.19-rc1 is tagged so that I can focus on starting a massive design review for the (feature complete after five years) online repair feature. Summary: - Fix some incorrect mapping state being passed to iomap during COW - Don't create bogus selinux audit messages when deciding to degrade gracefully due to lack of privilege - Fix setattr implementation to use VFS helpers so that we drop setgid consistently with the other filesystems - Fix link/unlink/rename to check quota limits - Constify xfs_name_dotdot to prevent abuse of in-kernel symbols - Fix log livelock between the AIL and inodegc threads during recovery - Fix a log stall when the AIL races with pushers - Fix stalls in CIL flushes due to pinned inode cluster buffers during recovery - Fix log corruption due to incorrect usage of xfs_is_shutdown vs xlog_is_shutdown because during an induced fs shutdown, AIL writeback must continue until the log is shut down, even if the filesystem has already shut down" * tag 'xfs-5.18-merge-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: xfs_is_shutdown vs xlog_is_shutdown cage fight xfs: AIL should be log centric xfs: log items should have a xlog pointer, not a mount xfs: async CIL flushes need pending pushes to be made stable xfs: xfs_ail_push_all_sync() stalls when racing with updates xfs: check buffer pin state after locking in delwri_submit xfs: log worker needs to start before intent/unlink recovery xfs: constify xfs_name_dotdot xfs: constify the name argument to various directory functions xfs: reserve quota for target dir expansion when renaming files xfs: reserve quota for dir expansion when linking/unlinking files xfs: refactor user/group quota chown in xfs_setattr_nonsize xfs: use setattr_copy to set vfs inode attributes xfs: don't generate selinux audit messages for capability testing xfs: add missing cmap->br_state = XFS_EXT_NORM update |
||
Linus Torvalds
|
f0614eefbf |
dax for 5.18
- Fix a crash due to a missing rcu_barrier() in dax_fs_exit() - Fix two miscellaneous doc issues -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQSbo+XnGs+rwLz9XGXfioYZHlFsZwUCYjp4zgAKCRDfioYZHlFs ZwgDAQD/jJE/uyFozwZR/YD8G293Hzs/acPUEQ/09t06W68SIAEAtwBr3WHKiFJL M+utFlm05fShOQl3lGwqiPS5/mQvzwc= =7AnI -----END PGP SIGNATURE----- Merge tag 'dax-for-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull DAX updates from Dan Williams: "Andrew has been shepherding major dax features that touch the core -mm through his tree, but I still collect the dax updates that are core-mm independent. - Fix a crash due to a missing rcu_barrier() in dax_fs_exit() - Fix two miscellaneous doc issues" * tag 'dax-for-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: dax: Fix missing kdoc for dax_device dax: make sure inodes are flushed before destroy cache fsdax: fix function description |
||
Linus Torvalds
|
52deda9551 |
Merge branch 'akpm' (patches from Andrew)
Merge more updates from Andrew Morton: "Various misc subsystems, before getting into the post-linux-next material. 41 patches. Subsystems affected by this patch series: procfs, misc, core-kernel, lib, checkpatch, init, pipe, minix, fat, cgroups, kexec, kdump, taskstats, panic, kcov, resource, and ubsan" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (41 commits) Revert "ubsan, kcsan: Don't combine sanitizer with kcov on clang" kernel/resource: fix kfree() of bootmem memory again kcov: properly handle subsequent mmap calls kcov: split ioctl handling into locked and unlocked parts panic: move panic_print before kmsg dumpers panic: add option to dump all CPUs backtraces in panic_print docs: sysctl/kernel: add missing bit to panic_print taskstats: remove unneeded dead assignment kasan: no need to unset panic_on_warn in end_report() ubsan: no need to unset panic_on_warn in ubsan_epilogue() panic: unset panic_on_warn inside panic() docs: kdump: add scp example to write out the dump file docs: kdump: update description about sysfs file system support arm64: mm: use IS_ENABLED(CONFIG_KEXEC_CORE) instead of #ifdef x86/setup: use IS_ENABLED(CONFIG_KEXEC_CORE) instead of #ifdef riscv: mm: init: use IS_ENABLED(CONFIG_KEXEC_CORE) instead of #ifdef kexec: make crashk_res, crashk_low_res and crash_notes symbols always visible cgroup: use irqsave in cgroup_rstat_flush_locked(). fat: use pointer to simple type in put_user() minix: fix bug when opening a file with O_DIRECT ... |
||
Linus Torvalds
|
3ce62cf4dc |
flexible-array transformations for 5.18-rc1
Hi Linus, Please, pull the following treewide patch that replaces zero-length arrays with flexible-array members. This patch has been baking in linux-next for a whole development cycle. Thanks -- Gustavo -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEkmRahXBSurMIg1YvRwW0y0cG2zEFAmI6GIUACgkQRwW0y0cG 2zFLWw/+OB1gZeQD3boKpUMntWnn6wjhUxdrO8CYkpzG+B+8TFECXNjy8HV1CSiw GKKRndYELOyYaD5o/F2vtPe10iPHbrdIlMFRPBRoht0/cvSZgzHlfT8EjWQwerYY dieztUFKjeSj0MXivdNDnKOTm8o9cz8KmCrWFP+My37Fasn/9+nBX8iNVIvAX4xy T+IVmjtDifQUsTs298UGnBvDeuZOiGHhXXU5rq6lIX0Rl554OsWZW94d6jUPj/h7 t1v6jdojNuyaMKn45/xnPj9VvmDiSu3K67m3fjRdzLPDOhISjr2fw4KEUOKdsebh yJ9t5u8IufyPbm9kyI+rZt+T8ZlV2/qt2+mt6QgtDMnWrs+4nU15JY0SHImMSBZQ rBEZcQlrIcGJ+CsNB8Y7jIGYO0SSkhodAvfl0LRA0AbTqLGqq0OkAQS5D52r3H2r uz6xdYb7kG43XaRyaAIPqhZsp/jk2NrXvEvin2tSaXZFR1cxp+oxcV2UajmnOU6i EIBS4PzJnYx2RZRa+h8YbBa/+D4N6+fj/tjmwBawiUBPjjaLAsGFNwUHqvBoD05S bk6oXi654NBwVjsknZ0grVz0TtSvdZ3uJL5FZApTOHITqH8vlxlNefmHri4vZRZO NN7NIQ0yaUCnorzMg+vP8ZtflhQwrMJbjwIS9YD0RHd7MBhYX8k= =xZD2 -----END PGP SIGNATURE----- Merge tag 'flexible-array-transformations-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux Pull flexible-array transformations from Gustavo Silva: "Treewide patch that replaces zero-length arrays with flexible-array members. This has been baking in linux-next for a whole development cycle" * tag 'flexible-array-transformations-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux: treewide: Replace zero-length arrays with flexible-array members |
||
Linus Torvalds
|
2e2d4650b3 |
fs.rt.v5.18
-----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCYjr50AAKCRCRxhvAZXjc oroBAQC0daGrkl6XJirLzyMVjNWWbuSEdB+Q3ipMrwySxjYmtwD/XSPGAKGBbHQz 13EtvTMT8FlqSyDIKHYBM5uDyrWQZgs= =uJdq -----END PGP SIGNATURE----- Merge tag 'fs.rt.v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux Pull mount attributes PREEMPT_RT update from Christian Brauner: "This contains Sebastian's fix to make changing mount attributes/getting write access compatible with CONFIG_PREEMPT_RT. The change only applies when users explicitly opt-in to real-time via CONFIG_PREEMPT_RT otherwise things are exactly as before. We've waited quite a long time with this to make sure folks could take a good look" * tag 'fs.rt.v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: fs/namespace: Boost the mount_lock.lock owner instead of spinning on PREEMPT_RT. |
||
Linus Torvalds
|
15f2e3d6c1 |
fs.v5.18
-----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCYjrxKAAKCRCRxhvAZXjc oi77AQDdLziGyeKSHlyPw1xeWiqsZVRLhhD07F8kVaAkjg3ZWwEAs3jL0Pzfnd7J IIx//6YK46cbfc471FdVuUDfMjkfVAw= =VaFd -----END PGP SIGNATURE----- Merge tag 'fs.v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux Pull mount_setattr updates from Christian Brauner: "This contains a few more patches to massage the mount_setattr() codepaths and one minor fix to reuse a helper we added some time back. The final two patches do similar cleanups in different ways. One patch is mine and the other is Al's who was nice enough to give me a branch for it. Since his came in later and my branch had been sitting in -next for quite some time we just put his on top instead of swap them" * tag 'fs.v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: mount_setattr(): clean the control flow and calling conventions fs: clean up mount_setattr control flow fs: don't open-code mnt_hold_writers() fs: simplify check in mount_setattr_commit() fs: add mnt_allow_writers() and simplify mount_setattr_prepare() |
||
Olga Kornievskaia
|
1d15d121cc |
NFSv4.1: don't retry BIND_CONN_TO_SESSION on session error
There is no reason to retry the operation if a session error had
occurred in such case result structure isn't filled out.
Fixes:
|
||
Jakob Koschel
|
3de24f3d70 |
NFS: replace usage of found with dedicated list iterator variable
To move the list iterator variable into the list_for_each_entry_*() macro in the future it should be avoided to use the list iterator variable after the loop body. To *never* use the list iterator variable after the loop it was concluded to use a separate iterator variable instead of a found boolean [1]. This removes the need to use a found variable and simply checking if the variable was set, can determine if the break/goto was hit. Link: https://lore.kernel.org/all/CAHk-=wgRr_D8CB-D9Kg-c=EHreAsk5SqXPwr9Y7k9sA6cWXJ6w@mail.gmail.com/ Signed-off-by: Jakob Koschel <jakobkoschel@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> |
||
Helge Deller
|
2cd50532ce |
fat: use pointer to simple type in put_user()
The put_user(val,ptr) macro wants a pointer to a simple type, but in fat_ioctl_filldir() the d_name field references an "array of chars". Be more accurate and explicitly give the pointer to the first character of the d_name[] array. I noticed that issue while trying to optimize the parisc put_user() macro and used an intermediate variable to store the pointer. In that case I got this error: In file included from include/linux/uaccess.h:11, from include/linux/compat.h:17, from fs/fat/dir.c:18: fs/fat/dir.c: In function `fat_ioctl_filldir': fs/fat/dir.c:725:33: error: invalid initializer 725 | if (put_user(0, d2->d_name) || \ | ^~ include/asm/uaccess.h:152:33: note: in definition of macro `__put_user' 152 | __typeof__(ptr) __ptr = ptr; \ | ^~~ fs/fat/dir.c:759:1: note: in expansion of macro `FAT_IOCTL_FILLDIR_FUNC' 759 | FAT_IOCTL_FILLDIR_FUNC(fat_ioctl_filldir, __fat_dirent) Andreas Schwab <schwab@linux-m68k.org> suggested to use __typeof__(&*(ptr)) __ptr = ptr; instead. This works, but nevertheless it's probably reasonable to fix the original caller too. Link: https://lkml.kernel.org/r/Ygo+A9MREmC1H3kr@p100 Signed-off-by: Helge Deller <deller@gmx.de> Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Cc: David Laight <David.Laight@aculab.com> Cc: Andreas Schwab <schwab@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Qinghua Jin
|
9ce3c0d26c |
minix: fix bug when opening a file with O_DIRECT
Testcase: 1. create a minix file system and mount it 2. open a file on the file system with O_RDWR|O_CREAT|O_TRUNC|O_DIRECT 3. open fails with -EINVAL but leaves an empty file behind. All other open() failures don't leave the failed open files behind. It is hard to check the direct_IO op before creating the inode. Just as ext4 and btrfs do, this patch will resolve the issue by allowing to create the file with O_DIRECT but returning error when writing the file. Link: https://lkml.kernel.org/r/20220107133626.413379-1-qhjin.dev@gmail.com Signed-off-by: Qinghua Jin <qhjin.dev@gmail.com> Reported-by: Colin Ian King <colin.king@intel.com> Reviewed-by: Jan Kara <jack@suse.cz> Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Andrei Vagin
|
aeb213cdde |
fs/pipe.c: local vars have to match types of proper pipe_inode_info fields
head, tail, ring_size are declared as unsigned int, so all local variables that operate with these fields have to be unsigned to avoid signed integer overflow. Right now, it isn't an issue because the maximum pipe size is limited by 1U<<31. Link: https://lkml.kernel.org/r/20220106171946.36128-1-avagin@gmail.com Signed-off-by: Andrei Vagin <avagin@gmail.com> Suggested-by: Dmitry Safonov <0x7f454c46@gmail.com> Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Andrei Vagin
|
5a519c8fe4 |
fs/pipe: use kvcalloc to allocate a pipe_buffer array
Right now, kcalloc is used to allocate a pipe_buffer array. The size of the pipe_buffer struct is 40 bytes. kcalloc allows allocating reliably chunks with sizes less or equal to PAGE_ALLOC_COSTLY_ORDER (3). It means that the maximum pipe size is 3.2MB in this case. In CRIU, we use pipes to dump processes memory. CRIU freezes a target process, injects a parasite code into it and then this code splices memory into pipes. If a maximum pipe size is small, we need to do many iterations or create many pipes. kvcalloc attempt to allocate physically contiguous memory, but upon failure, fall back to non-contiguous (vmalloc) allocation and so it isn't limited by PAGE_ALLOC_COSTLY_ORDER. The maximum pipe size for non-root users is limited by the /proc/sys/fs/pipe-max-size sysctl that is 1MB by default, so only the root user will be able to trigger vmalloc allocations. Link: https://lkml.kernel.org/r/20220104171058.22580-1-avagin@gmail.com Signed-off-by: Andrei Vagin <avagin@gmail.com> Reviewed-by: Dmitry Safonov <0x7f454c46@gmail.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Yang Li
|
e9f5d1017c |
proc/vmcore: fix vmcore_alloc_buf() kernel-doc comment
Fix a spelling problem to remove warnings found by running scripts/kernel-doc, which is caused by using 'make W=1'. fs/proc/vmcore.c:492: warning: Function parameter or member 'size' not described in 'vmcore_alloc_buf' fs/proc/vmcore.c:492: warning: Excess function parameter 'sizez' description in 'vmcore_alloc_buf' Link: https://lkml.kernel.org/r/20220129011449.105278-1-yang.lee@linux.alibaba.com Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Reported-by: Abaci Robot <abaci@linux.alibaba.com> Acked-by: Baoquan He <bhe@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
David Hildenbrand
|
5039b17036 |
proc/vmcore: fix possible deadlock on concurrent mmap and read
Lockdep noticed that there is chance for a deadlock if we have concurrent
mmap, concurrent read, and the addition/removal of a callback.
As nicely explained by Boqun:
"Lockdep warned about the above sequences because rw_semaphore is a
fair read-write lock, and the following can cause a deadlock:
TASK 1 TASK 2 TASK 3
====== ====== ======
down_write(mmap_lock);
down_read(vmcore_cb_rwsem)
down_write(vmcore_cb_rwsem); // blocked
down_read(vmcore_cb_rwsem); // cannot get the lock because of the fairness
down_read(mmap_lock); // blocked
IOW, a reader can block another read if there is a writer queued by
the second reader and the lock is fair"
To fix this, convert to srcu to make this deadlock impossible. We need
srcu as our callbacks can sleep. With this change, I cannot trigger any
lockdep warnings.
======================================================
WARNING: possible circular locking dependency detected
5.17.0-0.rc0.20220117git0c947b893d69.68.test.fc36.x86_64 #1 Not tainted
------------------------------------------------------
makedumpfile/542 is trying to acquire lock:
ffffffff832d2eb8 (vmcore_cb_rwsem){.+.+}-{3:3}, at: mmap_vmcore+0x340/0x580
but task is already holding lock:
ffff8880af226438 (&mm->mmap_lock#2){++++}-{3:3}, at: vm_mmap_pgoff+0x84/0x150
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (&mm->mmap_lock#2){++++}-{3:3}:
lock_acquire+0xc3/0x1a0
__might_fault+0x4e/0x70
_copy_to_user+0x1f/0x90
__copy_oldmem_page+0x72/0xc0
read_from_oldmem+0x77/0x1e0
read_vmcore+0x2c2/0x310
proc_reg_read+0x47/0xa0
vfs_read+0x101/0x340
__x64_sys_pread64+0x5d/0xa0
do_syscall_64+0x43/0x90
entry_SYSCALL_64_after_hwframe+0x44/0xae
-> #0 (vmcore_cb_rwsem){.+.+}-{3:3}:
validate_chain+0x9f4/0x2670
__lock_acquire+0x8f7/0xbc0
lock_acquire+0xc3/0x1a0
down_read+0x4a/0x140
mmap_vmcore+0x340/0x580
proc_reg_mmap+0x3e/0x90
mmap_region+0x504/0x880
do_mmap+0x38a/0x520
vm_mmap_pgoff+0xc1/0x150
ksys_mmap_pgoff+0x178/0x200
do_syscall_64+0x43/0x90
entry_SYSCALL_64_after_hwframe+0x44/0xae
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&mm->mmap_lock#2);
lock(vmcore_cb_rwsem);
lock(&mm->mmap_lock#2);
lock(vmcore_cb_rwsem);
*** DEADLOCK ***
1 lock held by makedumpfile/542:
#0: ffff8880af226438 (&mm->mmap_lock#2){++++}-{3:3}, at: vm_mmap_pgoff+0x84/0x150
stack backtrace:
CPU: 0 PID: 542 Comm: makedumpfile Not tainted 5.17.0-0.rc0.20220117git0c947b893d69.68.test.fc36.x86_64 #1
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
Call Trace:
__lock_acquire+0x8f7/0xbc0
lock_acquire+0xc3/0x1a0
down_read+0x4a/0x140
mmap_vmcore+0x340/0x580
proc_reg_mmap+0x3e/0x90
mmap_region+0x504/0x880
do_mmap+0x38a/0x520
vm_mmap_pgoff+0xc1/0x150
ksys_mmap_pgoff+0x178/0x200
do_syscall_64+0x43/0x90
Link: https://lkml.kernel.org/r/20220119193417.100385-1-david@redhat.com
Fixes:
|
||
Hao Lee
|
3a72917ccf |
proc: alloc PATH_MAX bytes for /proc/${pid}/fd/ symlinks
It's not a standard approach that use __get_free_page() to alloc path buffer directly. We'd better use kmalloc and PATH_MAX. PAGE_SIZE is different on different archs. An unlinked file with very long canonical pathname will readlink differently because "(deleted)" eats into a buffer. --adobriyan [akpm@linux-foundation.org: remove now-unneeded cast] Link: https://lkml.kernel.org/r/Ye1fCxyZZ0I5lgOL@localhost.localdomain Signed-off-by: Hao Lee <haolee.swjtu@gmail.com> Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Christian Brauner <christian.brauner@ubuntu.com> Cc: Kees Cook <keescook@chromium.org> Cc: James Morris <jamorris@linux.microsoft.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Linus Torvalds
|
194dfe88d6 |
asm-generic updates for 5.18
There are three sets of updates for 5.18 in the asm-generic tree: - The set_fs()/get_fs() infrastructure gets removed for good. This was already gone from all major architectures, but now we can finally remove it everywhere, which loses some particularly tricky and error-prone code. There is a small merge conflict against a parisc cleanup, the solution is to use their new version. - The nds32 architecture ends its tenure in the Linux kernel. The hardware is still used and the code is in reasonable shape, but the mainline port is not actively maintained any more, as all remaining users are thought to run vendor kernels that would never be updated to a future release. There are some obvious conflicts against changes to the removed files. - A series from Masahiro Yamada cleans up some of the uapi header files to pass the compile-time checks. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEo6/YBQwIrVS28WGKmmx57+YAGNkFAmI69BsACgkQmmx57+YA GNn/zA//f4d5VTT0ThhRxRWTu9BdThGHoB8TUcY7iOhbsWu0X/913NItRC3UeWNl IdmisaXgVtirg1dcC2pWUmrcHdoWOCEGfK4+Zr2NhSWfuZDWvODHK9pGWk4WLnhe cQgUNBvIuuAMryGtrOBwHPO4TpfCyy2ioeVP36ZfcsWXdDxTrqfaq/56mk3sxIP6 sUTk1UEjut9NG4C9xIIvcSU50R3l6LryQE/H9kyTLtaSvfvTOvprcVYCq0GPmSzo DtQ1Wwa9zbJ+4EqoMiP5RrgQwWvOTg2iRByLU8ytwlX3e/SEF0uihvMv1FQbL8zG G8RhGUOKQSEhaBfc3lIkm8GpOVPh0uHzB6zhn7daVmAWtazRD2Nu59BMjipa+ims a8Z58iHH7jRAnKeEkVZqXKb1CEiUxaQx/IeVPzN4QlwMhDtwrI76LY7ZJ1zCqTGY ENG0yRLav1XselYBslOYXGtOEWcY5EZPWqLyWbp4P9vz2g0Fe0gZxoIOvPmNQc89 QnfXpCt7vm/DGkyO255myu08GOLeMkisVqUIzLDB9avlym5mri7T7vk9abBa2YyO CRpTL5gl1/qKPWuH1UI5mvhT+sbbBE2SUHSuy84btns39ZKKKynwCtdu+hSQkKLE h9pV30Gf1cLTD4JAE0RWlUgOmbBLVp34loTOexQj4MrLM1noOnw= =vtCN -----END PGP SIGNATURE----- Merge tag 'asm-generic-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic Pull asm-generic updates from Arnd Bergmann: "There are three sets of updates for 5.18 in the asm-generic tree: - The set_fs()/get_fs() infrastructure gets removed for good. This was already gone from all major architectures, but now we can finally remove it everywhere, which loses some particularly tricky and error-prone code. There is a small merge conflict against a parisc cleanup, the solution is to use their new version. - The nds32 architecture ends its tenure in the Linux kernel. The hardware is still used and the code is in reasonable shape, but the mainline port is not actively maintained any more, as all remaining users are thought to run vendor kernels that would never be updated to a future release. - A series from Masahiro Yamada cleans up some of the uapi header files to pass the compile-time checks" * tag 'asm-generic-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: (27 commits) nds32: Remove the architecture uaccess: remove CONFIG_SET_FS ia64: remove CONFIG_SET_FS support sh: remove CONFIG_SET_FS support sparc64: remove CONFIG_SET_FS support lib/test_lockup: fix kernel pointer check for separate address spaces uaccess: generalize access_ok() uaccess: fix type mismatch warnings from access_ok() arm64: simplify access_ok() m68k: fix access_ok for coldfire MIPS: use simpler access_ok() MIPS: Handle address errors for accesses above CPU max virtual user address uaccess: add generic __{get,put}_kernel_nofault nios2: drop access_ok() check from __put_user() x86: use more conventional access_ok() definition x86: remove __range_not_ok() sparc64: add __{get,put}_kernel_nofault() nds32: fix access_ok() checks in get/put_user uaccess: fix nios2 and microblaze get_user_8() sparc64: fix building assembly files ... |
||
Christoph Hellwig
|
61285ff72a |
fs: do not pass __GFP_HIGHMEM to bio_alloc in do_mpage_readpage
The mpage bio alloc cleanup accidentally removed clearing ~GFP_KERNEL
bits from the mask passed to bio_alloc. Fix this up in a slightly
less obsfucated way that mirrors what iomap does in its readpage code.
Fixes:
|
||
Linus Torvalds
|
6b1f86f8e9 |
Filesystem folio changes for 5.18
Primarily this series converts some of the address_space operations to take a folio instead of a page. ->is_partially_uptodate() takes a folio instead of a page and changes the type of the 'from' and 'count' arguments to make it obvious they're bytes. ->invalidatepage() becomes ->invalidate_folio() and has a similar type change. ->launder_page() becomes ->launder_folio() ->set_page_dirty() becomes ->dirty_folio() and adds the address_space as an argument. There are a couple of other misc changes up front that weren't worth separating into their own pull request. -----BEGIN PGP SIGNATURE----- iQEzBAABCgAdFiEEejHryeLBw/spnjHrDpNsjXcpgj4FAmI4hqMACgkQDpNsjXcp gj7r7Af/fVJ7m8kKqjP/IayX3HiJRuIDQw+vM++BlRNXdjz+IyED6whdmFGxJeOY BMyT+8ApOAz7ErS4G+7fAv4ScJK/aEgFUsnSeAiCp0PliiEJ5NNJzElp6sVmQ7H5 SX7+Ek444FZUGsQuy0qL7/ELpR3ditnD7x+5U2g0p5TeaHGUQn84crRyfR4xuhNG EBD9D71BOb7OxUcOHe93pTkK51QsQ0aCrcIsB1tkK5KR0BAthn1HqF7ehL90Rvrr omx5M7aDWGY4oj7IKrhlAs+55Ah2WaOzrZBp0FXNbr4UENDBKWKyUxErwa4xPkf6 Gm1iQG/CspOHnxN3YWsd5WjtlL3A+A== =cOiq -----END PGP SIGNATURE----- Merge tag 'folio-5.18b' of git://git.infradead.org/users/willy/pagecache Pull filesystem folio updates from Matthew Wilcox: "Primarily this series converts some of the address_space operations to take a folio instead of a page. Notably: - a_ops->is_partially_uptodate() takes a folio instead of a page and changes the type of the 'from' and 'count' arguments to make it obvious they're bytes. - a_ops->invalidatepage() becomes ->invalidate_folio() and has a similar type change. - a_ops->launder_page() becomes ->launder_folio() - a_ops->set_page_dirty() becomes ->dirty_folio() and adds the address_space as an argument. There are a couple of other misc changes up front that weren't worth separating into their own pull request" * tag 'folio-5.18b' of git://git.infradead.org/users/willy/pagecache: (53 commits) fs: Remove aops ->set_page_dirty fb_defio: Use noop_dirty_folio() fs: Convert __set_page_dirty_no_writeback to noop_dirty_folio fs: Convert __set_page_dirty_buffers to block_dirty_folio nilfs: Convert nilfs_set_page_dirty() to nilfs_dirty_folio() mm: Convert swap_set_page_dirty() to swap_dirty_folio() ubifs: Convert ubifs_set_page_dirty to ubifs_dirty_folio f2fs: Convert f2fs_set_node_page_dirty to f2fs_dirty_node_folio f2fs: Convert f2fs_set_data_page_dirty to f2fs_dirty_data_folio f2fs: Convert f2fs_set_meta_page_dirty to f2fs_dirty_meta_folio afs: Convert afs_dir_set_page_dirty() to afs_dir_dirty_folio() btrfs: Convert extent_range_redirty_for_io() to use folios fs: Convert trivial uses of __set_page_dirty_nobuffers to filemap_dirty_folio btrfs: Convert from set_page_dirty to dirty_folio fscache: Convert fscache_set_page_dirty() to fscache_dirty_folio() fs: Add aops->dirty_folio fs: Remove aops->launder_page orangefs: Convert launder_page to launder_folio nfs: Convert from launder_page to launder_folio fuse: Convert from launder_page to launder_folio ... |
||
Linus Torvalds
|
9030fb0bb9 |
Folio changes for 5.18
- Rewrite how munlock works to massively reduce the contention on i_mmap_rwsem (Hugh Dickins): https://lore.kernel.org/linux-mm/8e4356d-9622-a7f0-b2c-f116b5f2efea@google.com/ - Sort out the page refcount mess for ZONE_DEVICE pages (Christoph Hellwig): https://lore.kernel.org/linux-mm/20220210072828.2930359-1-hch@lst.de/ - Convert GUP to use folios and make pincount available for order-1 pages. (Matthew Wilcox) - Convert a few more truncation functions to use folios (Matthew Wilcox) - Convert page_vma_mapped_walk to use PFNs instead of pages (Matthew Wilcox) - Convert rmap_walk to use folios (Matthew Wilcox) - Convert most of shrink_page_list() to use a folio (Matthew Wilcox) - Add support for creating large folios in readahead (Matthew Wilcox) -----BEGIN PGP SIGNATURE----- iQEzBAABCgAdFiEEejHryeLBw/spnjHrDpNsjXcpgj4FAmI4ucgACgkQDpNsjXcp gj69Wgf6AwqwmO5Tmy+fLScDPqWxmXJofbocae1kyoGHf7Ui91OK4U2j6IpvAr+g P/vLIK+JAAcTQcrSCjymuEkf4HkGZOR03QQn7maPIEe4eLrZRQDEsmHC1L9gpeJp s/GMvDWiGE0Tnxu0EOzfVi/yT+qjIl/S8VvqtCoJv1HdzxitZ7+1RDuqImaMC5MM Qi3uHag78vLmCltLXpIOdpgZhdZexCdL2Y/1npf+b6FVkAJRRNUnA0gRbS7YpoVp CbxEJcmAl9cpJLuj5i5kIfS9trr+/QcvbUlzRxh4ggC58iqnmF2V09l2MJ7YU3XL v1O/Elq4lRhXninZFQEm9zjrri7LDQ== =n9Ad -----END PGP SIGNATURE----- Merge tag 'folio-5.18c' of git://git.infradead.org/users/willy/pagecache Pull folio updates from Matthew Wilcox: - Rewrite how munlock works to massively reduce the contention on i_mmap_rwsem (Hugh Dickins): https://lore.kernel.org/linux-mm/8e4356d-9622-a7f0-b2c-f116b5f2efea@google.com/ - Sort out the page refcount mess for ZONE_DEVICE pages (Christoph Hellwig): https://lore.kernel.org/linux-mm/20220210072828.2930359-1-hch@lst.de/ - Convert GUP to use folios and make pincount available for order-1 pages. (Matthew Wilcox) - Convert a few more truncation functions to use folios (Matthew Wilcox) - Convert page_vma_mapped_walk to use PFNs instead of pages (Matthew Wilcox) - Convert rmap_walk to use folios (Matthew Wilcox) - Convert most of shrink_page_list() to use a folio (Matthew Wilcox) - Add support for creating large folios in readahead (Matthew Wilcox) * tag 'folio-5.18c' of git://git.infradead.org/users/willy/pagecache: (114 commits) mm/damon: minor cleanup for damon_pa_young selftests/vm/transhuge-stress: Support file-backed PMD folios mm/filemap: Support VM_HUGEPAGE for file mappings mm/readahead: Switch to page_cache_ra_order mm/readahead: Align file mappings for non-DAX mm/readahead: Add large folio readahead mm: Support arbitrary THP sizes mm: Make large folios depend on THP mm: Fix READ_ONLY_THP warning mm/filemap: Allow large folios to be added to the page cache mm: Turn can_split_huge_page() into can_split_folio() mm/vmscan: Convert pageout() to take a folio mm/vmscan: Turn page_check_references() into folio_check_references() mm/vmscan: Account large folios correctly mm/vmscan: Optimise shrink_page_list for non-PMD-sized folios mm/vmscan: Free non-shmem folios without splitting them mm/rmap: Constify the rmap_walk_control argument mm/rmap: Convert rmap_walk() to take a folio mm: Turn page_anon_vma() into folio_anon_vma() mm/rmap: Turn page_lock_anon_vma_read() into folio_lock_anon_vma_read() ... |
||
Linus Torvalds
|
3bf03b9a08 |
Merge branch 'akpm' (patches from Andrew)
Merge updates from Andrew Morton: - A few misc subsystems: kthread, scripts, ntfs, ocfs2, block, and vfs - Most the MM patches which precede the patches in Willy's tree: kasan, pagecache, gup, swap, shmem, memcg, selftests, pagemap, mremap, sparsemem, vmalloc, pagealloc, memory-failure, mlock, hugetlb, userfaultfd, vmscan, compaction, mempolicy, oom-kill, migration, thp, cma, autonuma, psi, ksm, page-poison, madvise, memory-hotplug, rmap, zswap, uaccess, ioremap, highmem, cleanups, kfence, hmm, and damon. * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (227 commits) mm/damon/sysfs: remove repeat container_of() in damon_sysfs_kdamond_release() Docs/ABI/testing: add DAMON sysfs interface ABI document Docs/admin-guide/mm/damon/usage: document DAMON sysfs interface selftests/damon: add a test for DAMON sysfs interface mm/damon/sysfs: support DAMOS stats mm/damon/sysfs: support DAMOS watermarks mm/damon/sysfs: support schemes prioritization mm/damon/sysfs: support DAMOS quotas mm/damon/sysfs: support DAMON-based Operation Schemes mm/damon/sysfs: support the physical address space monitoring mm/damon/sysfs: link DAMON for virtual address spaces monitoring mm/damon: implement a minimal stub for sysfs-based DAMON interface mm/damon/core: add number of each enum type values mm/damon/core: allow non-exclusive DAMON start/stop Docs/damon: update outdated term 'regions update interval' Docs/vm/damon/design: update DAMON-Idle Page Tracking interference handling Docs/vm/damon: call low level monitoring primitives the operations mm/damon: remove unnecessary CONFIG_DAMON option mm/damon/paddr,vaddr: remove damon_{p,v}a_{target_valid,set_operations}() mm/damon/dbgfs-test: fix is_target_id() change ... |
||
Hugh Dickins
|
b698f0a177 |
mm/fs: delete PF_SWAPWRITE
PF_SWAPWRITE has been redundant since v3.2 commit
|
||
Nadav Amit
|
824ddc601a |
userfaultfd: provide unmasked address on page-fault
Userfaultfd is supposed to provide the full address (i.e., unmasked) of
the faulting access back to userspace. However, that is not the case for
quite some time.
Even running "userfaultfd_demo" from the userfaultfd man page provides the
wrong output (and contradicts the man page). Notice that
"UFFD_EVENT_PAGEFAULT event" shows the masked address (7fc5e30b3000) and
not the first read address (0x7fc5e30b300f).
Address returned by mmap() = 0x7fc5e30b3000
fault_handler_thread():
poll() returns: nready = 1; POLLIN = 1; POLLERR = 0
UFFD_EVENT_PAGEFAULT event: flags = 0; address = 7fc5e30b3000
(uffdio_copy.copy returned 4096)
Read address 0x7fc5e30b300f in main(): A
Read address 0x7fc5e30b340f in main(): A
Read address 0x7fc5e30b380f in main(): A
Read address 0x7fc5e30b3c0f in main(): A
The exact address is useful for various reasons and specifically for
prefetching decisions. If it is known that the memory is populated by
certain objects whose size is not page-aligned, then based on the faulting
address, the uffd-monitor can decide whether to prefetch and prefault the
adjacent page.
This bug has been for quite some time in the kernel: since commit
|
||
Muchun Song
|
f53bf711d4 |
mm: dcache: use kmem_cache_alloc_lru() to allocate dentry
Like inode cache, the dentry will also be added to its memcg list_lru. So replace kmem_cache_alloc() with kmem_cache_alloc_lru() to allocate dentry. Link: https://lkml.kernel.org/r/20220228122126.37293-8-songmuchun@bytedance.com Signed-off-by: Muchun Song <songmuchun@bytedance.com> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Cc: Alex Shi <alexs@kernel.org> Cc: Anna Schumaker <Anna.Schumaker@Netapp.com> Cc: Chao Yu <chao@kernel.org> Cc: Dave Chinner <david@fromorbit.com> Cc: Fam Zheng <fam.zheng@bytedance.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kari Argillander <kari.argillander@gmail.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Xiongchun Duan <duanxiongchun@bytedance.com> Cc: Yang Shi <shy828301@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Muchun Song
|
65d3af647b |
f2fs: allocate inode by using alloc_inode_sb()
The inode allocation is supposed to use alloc_inode_sb(), so convert kmem_cache_alloc() to alloc_inode_sb(). Link: https://lkml.kernel.org/r/20220228122126.37293-6-songmuchun@bytedance.com Signed-off-by: Muchun Song <songmuchun@bytedance.com> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Cc: Alex Shi <alexs@kernel.org> Cc: Anna Schumaker <Anna.Schumaker@Netapp.com> Cc: Chao Yu <chao@kernel.org> Cc: Dave Chinner <david@fromorbit.com> Cc: Fam Zheng <fam.zheng@bytedance.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kari Argillander <kari.argillander@gmail.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Xiongchun Duan <duanxiongchun@bytedance.com> Cc: Yang Shi <shy828301@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Muchun Song
|
fd60b28842 |
fs: allocate inode by using alloc_inode_sb()
The inode allocation is supposed to use alloc_inode_sb(), so convert kmem_cache_alloc() of all filesystems to alloc_inode_sb(). Link: https://lkml.kernel.org/r/20220228122126.37293-5-songmuchun@bytedance.com Signed-off-by: Muchun Song <songmuchun@bytedance.com> Acked-by: Theodore Ts'o <tytso@mit.edu> [ext4] Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Cc: Alex Shi <alexs@kernel.org> Cc: Anna Schumaker <Anna.Schumaker@Netapp.com> Cc: Chao Yu <chao@kernel.org> Cc: Dave Chinner <david@fromorbit.com> Cc: Fam Zheng <fam.zheng@bytedance.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kari Argillander <kari.argillander@gmail.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Xiongchun Duan <duanxiongchun@bytedance.com> Cc: Yang Shi <shy828301@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Muchun Song
|
8b9f3ac5b0 |
fs: introduce alloc_inode_sb() to allocate filesystems specific inode
The allocated inode cache is supposed to be added to its memcg list_lru which should be allocated as well in advance. That can be done by kmem_cache_alloc_lru() which allocates object and list_lru. The file systems is main user of it. So introduce alloc_inode_sb() to allocate file system specific inodes and set up the inode reclaim context properly. The file system is supposed to use alloc_inode_sb() to allocate inodes. In later patches, we will convert all users to the new API. Link: https://lkml.kernel.org/r/20220228122126.37293-4-songmuchun@bytedance.com Signed-off-by: Muchun Song <songmuchun@bytedance.com> Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev> Cc: Alex Shi <alexs@kernel.org> Cc: Anna Schumaker <Anna.Schumaker@Netapp.com> Cc: Chao Yu <chao@kernel.org> Cc: Dave Chinner <david@fromorbit.com> Cc: Fam Zheng <fam.zheng@bytedance.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kari Argillander <kari.argillander@gmail.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Xiongchun Duan <duanxiongchun@bytedance.com> Cc: Yang Shi <shy828301@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Minchan Kim
|
c0226eb8bd |
mm: fs: fix lru_cache_disabled race in bh_lru
Check lru_cache_disabled under bh_lru_lock. Otherwise, it could introduce
race below and it fails to migrate pages containing buffer_head.
CPU 0 CPU 1
bh_lru_install
lru_cache_disable
lru_cache_disabled = false
atomic_inc(&lru_disable_count);
invalidate_bh_lrus_cpu of CPU 0
bh_lru_lock
__invalidate_bh_lrus
bh_lru_unlock
bh_lru_lock
install the bh
bh_lru_unlock
WHen this race happens a CMA allocation fails, which is critical for
the workload which depends on CMA.
Link: https://lkml.kernel.org/r/20220308180709.2017638-1-minchan@kernel.org
Fixes:
|
||
Anthony Iliopoulos
|
a128b054ce |
mount: warn only once about timestamp range expiration
Commit
|
||
NeilBrown
|
a64239d0ef |
f2fs: replace congestion_wait() calls with io_schedule_timeout()
As congestion is no longer tracked, congestion_wait() is effectively equivalent to io_schedule_timeout(). So introduce f2fs_io_schedule_timeout() which sets TASK_UNINTERRUPTIBLE and call that instead. Link: https://lkml.kernel.org/r/164549983744.9187.6425865370954230902.stgit@noble.brown Signed-off-by: NeilBrown <neilb@suse.de> Cc: Anna Schumaker <Anna.Schumaker@Netapp.com> Cc: Chao Yu <chao@kernel.org> Cc: Darrick J. Wong <djwong@kernel.org> Cc: Ilya Dryomov <idryomov@gmail.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: Jeff Layton <jlayton@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Lars Ellenberg <lars.ellenberg@linbit.com> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: Paolo Valente <paolo.valente@linaro.org> Cc: Philipp Reisner <philipp.reisner@linbit.com> Cc: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
NeilBrown
|
b9b1335e64 |
remove bdi_congested() and wb_congested() and related functions
These functions are no longer useful as no BDIs report congestions any more. Removing the test on bdi_write_contested() in current_may_throttle() could cause a small change in behaviour, but only when PF_LOCAL_THROTTLE is set. So replace the calls by 'false' and simplify the code - and remove the functions. [akpm@linux-foundation.org: fix build] Link: https://lkml.kernel.org/r/164549983742.9187.2570198746005819592.stgit@noble.brown Signed-off-by: NeilBrown <neilb@suse.de> Acked-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> [nilfs] Cc: Anna Schumaker <Anna.Schumaker@Netapp.com> Cc: Chao Yu <chao@kernel.org> Cc: Darrick J. Wong <djwong@kernel.org> Cc: Ilya Dryomov <idryomov@gmail.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: Jeff Layton <jlayton@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Lars Ellenberg <lars.ellenberg@linbit.com> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: Paolo Valente <paolo.valente@linaro.org> Cc: Philipp Reisner <philipp.reisner@linbit.com> Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
NeilBrown
|
fe55d563d4 |
remove inode_congested()
inode_congested() reports if the backing-device for the inode is congested. No bdi reports congestion any more, so this always returns 'false'. So remove inode_congested() and related functions, and remove the call sites, assuming that inode_congested() always returns 'false'. Link: https://lkml.kernel.org/r/164549983741.9187.2174285592262191311.stgit@noble.brown Signed-off-by: NeilBrown <neilb@suse.de> Cc: Anna Schumaker <Anna.Schumaker@Netapp.com> Cc: Chao Yu <chao@kernel.org> Cc: Darrick J. Wong <djwong@kernel.org> Cc: Ilya Dryomov <idryomov@gmail.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: Jeff Layton <jlayton@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Lars Ellenberg <lars.ellenberg@linbit.com> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: Paolo Valente <paolo.valente@linaro.org> Cc: Philipp Reisner <philipp.reisner@linbit.com> Cc: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |