linux

iv/linux

Author	SHA1	Message	Date
Linus Torvalds	b485625078	sysctl: treewide: constify the ctl_table argument of proc_handlers Summary - const qualify struct ctl_table args in proc_handlers: This is a prerequisite to moving the static ctl_table structs into .rodata data which will ensure that proc_handler function pointers cannot be modified. -----BEGIN PGP SIGNATURE----- iQGzBAABCgAdFiEErkcJVyXmMSXOyyeQupfNUreWQU8FAmahVOcACgkQupfNUreW QU8QdgwAiDZMEQ6psvDfLGQlOvHyp21exZFC/i1OCrCq4BmTu83iYcojIKSaZ0q6 pfR9eYTVhNfAGsV2hQJvvUQPSVNESB/A7Z2zGLVQmZcOerAN17L2C7uA1SkbRBel P3yoMX6wruVCnBgQTcjktoBgpDqaOiqLTSh8Ko512qYOClLf22sCbbk+doiOwMF6 zbzh9bRo+qTuTgW30gwzRNml/lA1/XylSsM/cEFCxsc8k22lCneP+iWKDAfWtsr6 0GL+SvbBTyIEMFFvSw+0MUW30QctxaCYA1pBmn8nmc6eJ2yDly75/r8EWRr7zqDW SrRiXYDAzNcW3tiOdC2FW/J95VX62DiSQporVZ/OvahBKpQKGaLLNjx1EE3qGKIk ggnYiDmIO88ZBHoRbDDK9EdGGqrIZfRx9JeFD3bMOrwT3Ffxw30rwcYpzlOyjFz3 TIUeNthGXoQNRxZfTE+iybotKWcXnxCNe4U1Mvc2liY6+Z0jazDWfs+jV7CiuX1G Xk8jpTCR =JMg4 -----END PGP SIGNATURE----- Merge tag 'constfy-sysctl-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl Pull sysctl constification from Joel Granados: "Treewide constification of the ctl_table argument of proc_handlers using a coccinelle script and some manual code formatting fixups. This is a prerequisite to moving the static ctl_table structs into read-only data section which will ensure that proc_handler function pointers cannot be modified" * tag 'constfy-sysctl-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/sysctl/sysctl: sysctl: treewide: constify the ctl_table argument of proc_handlers	2024-07-25 12:58:36 -07:00
Linus Torvalds	f9bcc61ad1	This pull request contains the following changes for UML: - Support for preemption - i386 Rust support - Huge cleanup by Benjamin Berg - UBSAN support - Removal of dead code -----BEGIN PGP SIGNATURE----- iQJKBAABCAA0FiEEdgfidid8lnn52cLTZvlZhesYu8EFAmahIpkWHHJpY2hhcmRA c2lnbWEtc3Rhci5hdAAKCRBm+VmF6xi7wW4PD/wN03iDGNTPhegGgXTJTSwA8Gwk i5JTEmhc84ifE9/bJpru8w4mcLMiWLWFIpF4bGcqfKLp67tTi3jn9Vk7ivaYkn2G S875GqjdyqMVMfhJX+1qTxM6q/J5B7XGUpt1Zrot3AY1ANxnlwYscWX8jNvwmf+5 eCK9+xldkNWh1N67EjwsDgH6kkWyx3fcEe4E3gjXY0eSZtIwO/ZXYHSCSKznJOfu iXo1Sx02w8TZp4tf/EwpWR1SMkPL23X8Of+rmiyI5udyLZixTnrFlclu8WUK4ZBO ExYvOrzyYZ3E/mPFZf0E88h8xC3ETLsiHO3++JRAM1uDMp1+a6tPK7Bi6NTytemH PIT++XRiORAbXu3aSTjpFDAhTHIMZ925eJMvQAtVhtAAwbkjSNh9NbusbMiucPNm vvtYrEqYjPJpx+HRxy8kUywe/+jFLYofSDn6YrNRM+3HaM44YgkvbD6AOEMxWq19 YWkflmkDADez6eti03bAbiVuBB1v+Vnuz15ofrx45IUubb3uGVJYwEqQA5u8bAVr H4NeIWDRpXOuYLgSyxRLFFVYhe6eAWbAXeSWBFxcGNDY6OBpMqr7kgV1mBOZtooK 8aBgZ0YcyiTpmiEevskkNWSBnqUMKIdztKkD7Db9HfCgd9yy7Vvfl+iLJTIqFJ5m JxpvTy3it53ghQj40A== =ybqq -----END PGP SIGNATURE----- Merge tag 'uml-for-linus-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux Pull UML updates from Richard Weinberger: - Support for preemption - i386 Rust support - Huge cleanup by Benjamin Berg - UBSAN support - Removal of dead code * tag 'uml-for-linus-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux: (41 commits) um: vector: always reset vp->opened um: vector: remove vp->lock um: register power-off handler um: line: always fill *error_out in setup_one_line() um: remove pcap driver from documentation um: Enable preemption in UML um: refactor TLB update handling um: simplify and consolidate TLB updates um: remove force_flush_all from fork_handler um: Do not flush MM in flush_thread um: Delay flushing syscalls until the thread is restarted um: remove copy_context_skas0 um: remove LDT support um: compress memory related stub syscalls while adding them um: Rework syscall handling um: Add generic stub_syscall6 function um: Create signal stack memory assignment in stub_data um: Remove stub-data.h include from common-offsets.h um: time-travel: fix signal blocking race/hang um: time-travel: remove time_exit() ...	2024-07-25 12:33:08 -07:00
Joel Granados	78eb4ea25c	sysctl: treewide: constify the ctl_table argument of proc_handlers const qualify the struct ctl_table argument in the proc_handler function signatures. This is a prerequisite to moving the static ctl_table structs into .rodata data which will ensure that proc_handler function pointers cannot be modified. This patch has been generated by the following coccinelle script: ``` virtual patch @r1@ identifier ctl, write, buffer, lenp, ppos; identifier func !~ "appldata_(timer\|interval)_handler\|sched_(rt\|rr)_handler\|rds_tcp_skbuf_handler\|proc_sctp_do_(hmac_alg\|rto_min\|rto_max\|udp_port\|alpha_beta\|auth\|probe_interval)"; @@ int func( - struct ctl_table ctl + const struct ctl_table ctl ,int write, void buffer, size_t lenp, loff_t ppos); @r2@ identifier func, ctl, write, buffer, lenp, ppos; @@ int func( - struct ctl_table ctl + const struct ctl_table ctl ,int write, void buffer, size_t lenp, loff_t ppos) { ... } @r3@ identifier func; @@ int func( - struct ctl_table * + const struct ctl_table * ,int , void , size_t , loff_t ); @r4@ identifier func, ctl; @@ int func( - struct ctl_table ctl + const struct ctl_table ctl ,int , void , size_t , loff_t ); @r5@ identifier func, write, buffer, lenp, ppos; @@ int func( - struct ctl_table * + const struct ctl_table * ,int write, void buffer, size_t lenp, loff_t ppos); ``` Code formatting was adjusted in xfs_sysctl.c to comply with code conventions. The xfs_stats_clear_proc_handler, xfs_panic_mask_proc_handler and xfs_deprecated_dointvec_minmax where adjusted. * The ctl_table argument in proc_watchdog_common was const qualified. This is called from a proc_handler itself and is calling back into another proc_handler, making it necessary to change it as part of the proc_handler migration. Co-developed-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Co-developed-by: Joel Granados <j.granados@samsung.com> Signed-off-by: Joel Granados <j.granados@samsung.com>	2024-07-24 20:59:29 +02:00
Linus Torvalds	7a3fad30fd	Random number generator updates for Linux 6.11-rc1. -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEq5lC5tSkz8NBJiCnSfxwEqXeA64FAmaarzgACgkQSfxwEqXe A66ZWBAAlhXx8bve0uKlDRK8fffWHgruho/fOY4lZJ137AKwA9JCtmOyqdfL4Dmk VxFe7pEQJlQhcA/6kH54uO7SBXwfKlKZJth6SYnaCRMUIbFifHjjIQ0QqldjEKi0 rP90Hu4FVsbwQC7u9i9lQj9n2P36zb6pn83BzpZQ/2PtoVCSCrdSJUe0Rxa3H3GN 0+nNkDSXQt5otCByLaeE3x7KJgXLWL9+G2eFSFLTZ8rSVfMx1CdOIAG37WlLGdWm BaFYPDKMyBTVvVJBNgAe9YSqtrsZ5nlmLz+Z9wAe/hTL7RlL03kWUu34/Udcpull zzMDH0WMntiGK3eFQ2gOYSWqypvAjwHgn3BzqNmjUb69+89mZsdU1slcvnxWsUwU D3vphrscaqarF629tfsXti3jc5PoXwUTjROZVcCyeFPBhyAZgzK8xUvPpJO+RT+K EuUABob9cpA6FCpW/QeolDmMDhXlNT8QgsZu1juokZac2xP3Ly3REyEvT7HLbU2W ZJjbEqm1ppp3RmGELUOJbyhwsLrnbt+OMDO7iEWoG8aSFK4diBK/ZM6WvLMkr8Oi 7ioXGIsYkCy3c47wpZKTrAapOPJp5keqNAiHSEbXw8mozp6429QAEZxNOcczgHKC Ea2JzRkctqutcIT+Slw/uUe//i1iSsIHXbE81fp5udcQTJcUByo= =P8aI -----END PGP SIGNATURE----- Merge tag 'random-6.11-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random Pull random number generator updates from Jason Donenfeld: "This adds getrandom() support to the vDSO. First, it adds a new kind of mapping to mmap(2), MAP_DROPPABLE, which lets the kernel zero out pages anytime under memory pressure, which enables allocating memory that never gets swapped to disk but also doesn't count as being mlocked. Then, the vDSO implementation of getrandom() is introduced in a generic manner and hooked into random.c. Next, this is implemented on x86. (Also, though it's not ready for this pull, somebody has begun an arm64 implementation already) Finally, two vDSO selftests are added. There are also two housekeeping cleanup commits" * tag 'random-6.11-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random: MAINTAINERS: add random.h headers to RNG subsection random: note that RNDGETPOOL was removed in 2.6.9-rc2 selftests/vDSO: add tests for vgetrandom x86: vdso: Wire up getrandom() vDSO implementation random: introduce generic vDSO getrandom() implementation mm: add MAP_DROPPABLE for designating always lazily freeable mappings	2024-07-24 10:29:50 -07:00
Linus Torvalds	d1e9a63dcd	vfs-6.11-rc1.fixes.2 -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZqDFUwAKCRCRxhvAZXjc omD6APwJKlepwDYlu5XZptI6/1kmai6SqaYnifTX1+ELR/rQQAD/Z37aho42v2JZ NYr+KFj02vj7ryKA5OWuSD8cw+6GlwQ= =dfob -----END PGP SIGNATURE----- Merge tag 'vfs-6.11-rc1.fixes.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs fixes from Christian Brauner: "VFS: - The new 64bit mount ids start after the old mount id, i.e., at the first non-32 bit value. However, we started counting one id too late and thus lost 4294967296 as the first valid id. Fix that. - Update a few comments on some vfs_() creation helpers. - Move copying of the xattr name out from the locks required to start a filesystem write. - Extend the filelock lock UAF fix to the compat code as well. - Now that we added the ability to look up an inode under RCU it's possible that lockless hash lookup can find and lock an inode after it gets I_FREEING set. It then waits until inode teardown in evict() is finished. The flag however is still set after evict() has woken up all waiters. If the inode lock is taken late enough on the waiting side after hash removal and wakeup happened the waiting thread will never be woken. Before RCU based lookup this was synchronized via the inode_hash_lock. But since unhashing requires the inode lock as well we can check whether the inode is unhashed while holding inode lock even without holding inode_hash_lock. pidfd: - The nsproxy structure contains nearly all of the namespaces associated with a task. When a namespace type isn't supported nsproxy might contain a NULL pointer or always point to the initial namespace type. The logic isn't consistent. So when deriving namespace fds we need to ensure that the namespace type is supported. First, so that we don't risk dereferncing NULL pointers. The correct bigger fix would be to change all namespaces to always set a valid namespace pointer in struct nsproxy independent of whether or not it is compiled in. But that requires quite a few changes. Second, so that we don't allow deriving namespace fds when the namespace type doesn't exist and thus when they couldn't also be derived via /proc/self/ns/. - Add missing selftests for the new pidfd ioctls to derive namespace fds. This simply extends the already existing testsuite. netfs: - Fix debug logging and fix kconfig variable name so it actually works. - Fix writeback that goes both to the server and cache. The streams are only activated once a subreq is added. When a server write happens the subreq doesn't need to have finished by the time the cache write is started. If the server write has already finished by the time the cache write is about to start the cache write will operate on a folio that might already have been reused. Fix this by preactivating the cache write. - Limit cachefiles subreq size for cache writes to MAX_RW_COUNT" tag 'vfs-6.11-rc1.fixes.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: inode: clarify what's locked vfs: Fix potential circular locking through setxattr() and removexattr() filelock: Fix fcntl/close race recovery compat path fs: use all available ids cachefiles: Set the max subreq size for cache writes to MAX_RW_COUNT netfs: Fix writeback that needs to go to both server and cache pidfs: add selftests for new namespace ioctls pidfs: handle kernels without namespaces cleanly pidfs: when time ns disabled add check for ioctl vfs: correct the comments of vfs_*() helpers vfs: handle __wait_on_freeing_inode() and evict() race netfs: Rename CONFIG_FSCACHE_DEBUG to CONFIG_NETFS_DEBUG netfs: Revert "netfs: Switch debug logging to pr_debug()"	2024-07-24 09:42:51 -07:00
Linus Torvalds	e44be00289	hostfs: fix folio conversion Commit e3ec0fe944d2 ("hostfs: Convert hostfs_read_folio() to use a folio") simplified hostfs_read_folio(), but in the process of converting to using folios natively also mis-used the folio_zero_tail() function due to the very confusing API of that function. Very arguably it's folio_zero_tail() API itself that is buggy, since it would make more sense (and the documentation kind of implies) that the third argument would be the pointer to the beginning of the folio buffer. But no, the third argument to folio_zero_tail() is where we should start zeroing the tail (even if we already also pass in the offset separately as the second argument). So fix the hostfs caller, and we can leave any folio_zero_tail() sanity cleanup for later. Reported-and-tested-by: Maciej Żenczykowski <maze@google.com> Fixes: e3ec0fe944d2 ("hostfs: Convert hostfs_read_folio() to use a folio") Link: https://lore.kernel.org/all/CANP3RGceNzwdb7w=vPf5=7BCid5HVQDmz1K5kC9JG42+HVAh_g@mail.gmail.com/ Cc: Matthew Wilcox <willy@infradead.org> Cc: Christian Brauner <brauner@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2024-07-24 09:25:15 -07:00
Christian Brauner	f5e5e97c71	inode: clarify what's locked In __wait_on_freeing_inode() we warn in case the inode_hash_lock is held but the inode is unhashed. We then release the inode_lock. So using "locked" as parameter name is confusing. Use is_inode_hash_locked as parameter name instead. Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-07-24 11:11:40 +02:00
David Howells	c3a5e3e872	vfs: Fix potential circular locking through setxattr() and removexattr() When using cachefiles, lockdep may emit something similar to the circular locking dependency notice below. The problem appears to stem from the following: (1) Cachefiles manipulates xattrs on the files in its cache when called from ->writepages(). (2) The setxattr() and removexattr() system call handlers get the name (and value) from userspace after taking the sb_writers lock, putting accesses of the vma->vm_lock and mm->mmap_lock inside of that. (3) The afs filesystem uses a per-inode lock to prevent multiple revalidation RPCs and in writeback vs truncate to prevent parallel operations from deadlocking against the server on one side and local page locks on the other. Fix this by moving the getting of the name and value in {get,remove}xattr() outside of the sb_writers lock. This also has the minor benefits that we don't need to reget these in the event of a retry and we never try to take the sb_writers lock in the event we can't pull the name and value into the kernel. Alternative approaches that might fix this include moving the dispatch of a write to the cache off to a workqueue or trying to do without the validation lock in afs. Note that this might also affect other filesystems that use netfslib and/or cachefiles. ====================================================== WARNING: possible circular locking dependency detected 6.10.0-build2+ #956 Not tainted ------------------------------------------------------ fsstress/6050 is trying to acquire lock: ffff888138fd82f0 (mapping.invalidate_lock#3){++++}-{3:3}, at: filemap_fault+0x26e/0x8b0 but task is already holding lock: ffff888113f26d18 (&vma->vm_lock->lock){++++}-{3:3}, at: lock_vma_under_rcu+0x165/0x250 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #4 (&vma->vm_lock->lock){++++}-{3:3}: __lock_acquire+0xaf0/0xd80 lock_acquire.part.0+0x103/0x280 down_write+0x3b/0x50 vma_start_write+0x6b/0xa0 vma_link+0xcc/0x140 insert_vm_struct+0xb7/0xf0 alloc_bprm+0x2c1/0x390 kernel_execve+0x65/0x1a0 call_usermodehelper_exec_async+0x14d/0x190 ret_from_fork+0x24/0x40 ret_from_fork_asm+0x1a/0x30 -> #3 (&mm->mmap_lock){++++}-{3:3}: __lock_acquire+0xaf0/0xd80 lock_acquire.part.0+0x103/0x280 __might_fault+0x7c/0xb0 strncpy_from_user+0x25/0x160 removexattr+0x7f/0x100 __do_sys_fremovexattr+0x7e/0xb0 do_syscall_64+0x9f/0x100 entry_SYSCALL_64_after_hwframe+0x76/0x7e -> #2 (sb_writers#14){.+.+}-{0:0}: __lock_acquire+0xaf0/0xd80 lock_acquire.part.0+0x103/0x280 percpu_down_read+0x3c/0x90 vfs_iocb_iter_write+0xe9/0x1d0 __cachefiles_write+0x367/0x430 cachefiles_issue_write+0x299/0x2f0 netfs_advance_write+0x117/0x140 netfs_write_folio.isra.0+0x5ca/0x6e0 netfs_writepages+0x230/0x2f0 afs_writepages+0x4d/0x70 do_writepages+0x1e8/0x3e0 filemap_fdatawrite_wbc+0x84/0xa0 __filemap_fdatawrite_range+0xa8/0xf0 file_write_and_wait_range+0x59/0x90 afs_release+0x10f/0x270 __fput+0x25f/0x3d0 __do_sys_close+0x43/0x70 do_syscall_64+0x9f/0x100 entry_SYSCALL_64_after_hwframe+0x76/0x7e -> #1 (&vnode->validate_lock){++++}-{3:3}: __lock_acquire+0xaf0/0xd80 lock_acquire.part.0+0x103/0x280 down_read+0x95/0x200 afs_writepages+0x37/0x70 do_writepages+0x1e8/0x3e0 filemap_fdatawrite_wbc+0x84/0xa0 filemap_invalidate_inode+0x167/0x1e0 netfs_unbuffered_write_iter+0x1bd/0x2d0 vfs_write+0x22e/0x320 ksys_write+0xbc/0x130 do_syscall_64+0x9f/0x100 entry_SYSCALL_64_after_hwframe+0x76/0x7e -> #0 (mapping.invalidate_lock#3){++++}-{3:3}: check_noncircular+0x119/0x160 check_prev_add+0x195/0x430 __lock_acquire+0xaf0/0xd80 lock_acquire.part.0+0x103/0x280 down_read+0x95/0x200 filemap_fault+0x26e/0x8b0 __do_fault+0x57/0xd0 do_pte_missing+0x23b/0x320 __handle_mm_fault+0x2d4/0x320 handle_mm_fault+0x14f/0x260 do_user_addr_fault+0x2a2/0x500 exc_page_fault+0x71/0x90 asm_exc_page_fault+0x22/0x30 other info that might help us debug this: Chain exists of: mapping.invalidate_lock#3 --> &mm->mmap_lock --> &vma->vm_lock->lock Possible unsafe locking scenario: CPU0 CPU1 ---- ---- rlock(&vma->vm_lock->lock); lock(&mm->mmap_lock); lock(&vma->vm_lock->lock); rlock(mapping.invalidate_lock#3); * DEADLOCK * 1 lock held by fsstress/6050: #0: ffff888113f26d18 (&vma->vm_lock->lock){++++}-{3:3}, at: lock_vma_under_rcu+0x165/0x250 stack backtrace: CPU: 0 PID: 6050 Comm: fsstress Not tainted 6.10.0-build2+ #956 Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014 Call Trace: <TASK> dump_stack_lvl+0x57/0x80 check_noncircular+0x119/0x160 ? queued_spin_lock_slowpath+0x4be/0x510 ? __pfx_check_noncircular+0x10/0x10 ? __pfx_queued_spin_lock_slowpath+0x10/0x10 ? mark_lock+0x47/0x160 ? init_chain_block+0x9c/0xc0 ? add_chain_block+0x84/0xf0 check_prev_add+0x195/0x430 __lock_acquire+0xaf0/0xd80 ? __pfx___lock_acquire+0x10/0x10 ? __lock_release.isra.0+0x13b/0x230 lock_acquire.part.0+0x103/0x280 ? filemap_fault+0x26e/0x8b0 ? __pfx_lock_acquire.part.0+0x10/0x10 ? rcu_is_watching+0x34/0x60 ? lock_acquire+0xd7/0x120 down_read+0x95/0x200 ? filemap_fault+0x26e/0x8b0 ? __pfx_down_read+0x10/0x10 ? __filemap_get_folio+0x25/0x1a0 filemap_fault+0x26e/0x8b0 ? __pfx_filemap_fault+0x10/0x10 ? find_held_lock+0x7c/0x90 ? __pfx___lock_release.isra.0+0x10/0x10 ? __pte_offset_map+0x99/0x110 __do_fault+0x57/0xd0 do_pte_missing+0x23b/0x320 __handle_mm_fault+0x2d4/0x320 ? __pfx___handle_mm_fault+0x10/0x10 handle_mm_fault+0x14f/0x260 do_user_addr_fault+0x2a2/0x500 exc_page_fault+0x71/0x90 asm_exc_page_fault+0x22/0x30 Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/2136178.1721725194@warthog.procyon.org.uk cc: Alexander Viro <viro@zeniv.linux.org.uk> cc: Christian Brauner <brauner@kernel.org> cc: Jan Kara <jack@suse.cz> cc: Jeff Layton <jlayton@kernel.org> cc: Gao Xiang <xiang@kernel.org> cc: Matthew Wilcox <willy@infradead.org> cc: netfs@lists.linux.dev cc: linux-erofs@lists.ozlabs.org cc: linux-fsdevel@vger.kernel.org [brauner: fix minor issues] Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-07-24 10:53:14 +02:00
Jann Horn	f8138f2ad2	filelock: Fix fcntl/close race recovery compat path When I wrote commit 3cad1bc01041 ("filelock: Remove locks reliably when fcntl/close race is detected"), I missed that there are two copies of the code I was patching: The normal version, and the version for 64-bit offsets on 32-bit kernels. Thanks to Greg KH for stumbling over this while doing the stable backport... Apply exactly the same fix to the compat path for 32-bit kernels. Fixes: c293621bbf67 ("[PATCH] stale POSIX lock handling") Cc: stable@kernel.org Link: https://bugs.chromium.org/p/project-zero/issues/detail?id=2563 Signed-off-by: Jann Horn <jannh@google.com> Link: https://lore.kernel.org/r/20240723-fs-lock-recover-compatfix-v1-1-148096719529@google.com Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-07-24 10:53:14 +02:00
Christian Brauner	8eac5358ad	fs: use all available ids The counter is unconditionally incremented for each mount allocation. If we set it to 1ULL << 32 we're losing 4294967296 as the first valid non-32 bit mount id. Link: https://lore.kernel.org/r/20240719-work-mount-namespace-v1-1-834113cab0d2@kernel.org Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-07-24 10:53:13 +02:00
David Howells	51d37982bb	cachefiles: Set the max subreq size for cache writes to MAX_RW_COUNT Set the maximum size of a subrequest that writes to cachefiles to be MAX_RW_COUNT so that we don't overrun the maximum write we can make to the backing filesystem. Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/1599005.1721398742@warthog.procyon.org.uk cc: Jeff Layton <jlayton@kernel.org> cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-07-24 10:53:13 +02:00
David Howells	212be98aa1	netfs: Fix writeback that needs to go to both server and cache When netfslib is performing writeback (ie. ->writepages), it maintains two parallel streams of writes, one to the server and one to the cache, but it doesn't mark either stream of writes as active until it gets some data that needs to be written to that stream. This is done because some folios will only be written to the cache (e.g. copying to the cache on read is done by marking the folios and letting writeback do the actual work) and sometimes we'll only be writing to the server (e.g. if there's no cache). Now, since we don't actually dispatch uploads and cache writes in parallel, but rather flip between the streams, depending on which has the lowest so-far-issued offset, and don't wait for the subreqs to finish before flipping, we can end up in a situation where, say, we issue a write to the server and this completes before we start the write to the cache. But because we only activate a stream when we first add a subreq to it, the result collection code may run before we manage to activate the stream - resulting in the folio being cleaned and having the writeback-in-progress mark removed. At this point, the folio no longer belongs to us. This is only really a problem for folios that need to be written to both streams - and in that case, the upload to the server is started first, followed by the write to the cache - and the cache write may see a bad folio. Fix this by activating the cache stream up front if there's a cache available. If there's a cache, then all data is going to be written to it. Fixes: 288ace2f57c9 ("netfs: New writeback implementation") Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/1599053.1721398818@warthog.procyon.org.uk cc: Jeff Layton <jlayton@kernel.org> cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-07-24 10:53:13 +02:00
Christian Brauner	9b3e150464	pidfs: handle kernels without namespaces cleanly The nsproxy structure contains nearly all of the namespaces associated with a task. When a given namespace type is not supported by this kernel the rules whether the corresponding pointer in struct nsproxy is NULL or always init_<ns_type>_ns differ per namespace. Ideally, that wouldn't be the case and for all namespace types we'd always set it to init_<ns_type>_ns when the corresponding namespace type isn't supported. Make sure we handle all namespaces where the pointer in struct nsproxy can be NULL when the namespace type isn't supported. Link: https://lore.kernel.org/r/20240722-work-pidfs-e6a83030f63e@brauner Fixes: 5b08bd408534 ("pidfs: allow retrieval of namespace file descriptors") # mainline only Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-07-24 10:53:13 +02:00
Edward Adam Davis	f60d38cb02	pidfs: when time ns disabled add check for ioctl syzbot call pidfd_ioctl() with cmd "PIDFD_GET_TIME_NAMESPACE" and disabled CONFIG_TIME_NS, since time_ns is NULL, it will make NULL ponter deref in open_namespace. Fixes: 5b08bd408534 ("pidfs: allow retrieval of namespace file descriptors") # mainline only Reported-and-tested-by: syzbot+34a0ee986f61f15da35d@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=34a0ee986f61f15da35d Signed-off-by: Edward Adam Davis <eadavis@qq.com> Link: https://lore.kernel.org/r/tencent_7FAE8DB725EE0DD69236DDABDDDE195E4F07@qq.com Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-07-24 10:53:12 +02:00
Congjie Zhou	b40c8e7a03	vfs: correct the comments of vfs_() helpers correct the comments of vfs_() helpers in fs/namei.c, including: 1. vfs_create() 2. vfs_mknod() 3. vfs_mkdir() 4. vfs_rmdir() 5. vfs_symlink() All of them come from the same commit: 6521f8917082 "namei: prepare for idmapped mounts" The @dentry is actually the dentry of child directory rather than base directory(parent directory), and thus the @dir has to be modified due to the change of @dentry. Signed-off-by: Congjie Zhou <zcjie0802@qq.com> Link: https://lore.kernel.org/r/tencent_2FCF6CC9E10DC8A27AE58A5A0FE4FCE96D0A@qq.com Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-07-24 10:53:12 +02:00
Mateusz Guzik	5bc9ad78c2	vfs: handle __wait_on_freeing_inode() and evict() race Lockless hash lookup can find and lock the inode after it gets the I_FREEING flag set, at which point it blocks waiting for teardown in evict() to finish. However, the flag is still set even after evict() wakes up all waiters. This results in a race where if the inode lock is taken late enough, it can happen after both hash removal and wakeups, meaning there is nobody to wake the racing thread up. This worked prior to RCU-based lookup because the entire ordeal was synchronized with the inode hash lock. Since unhashing requires the inode lock, we can safely check whether it happened after acquiring it. Link: https://lore.kernel.org/v9fs/20240717102458.649b60be@kernel.org/ Reported-by: Dominique Martinet <asmadeus@codewreck.org> Fixes: 7180f8d91fcb ("vfs: add rcu-based find_inode variants for iget ops") Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Link: https://lore.kernel.org/r/20240718151838.611807-1-mjguzik@gmail.com Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-07-24 10:52:58 +02:00
David Howells	fcad93360d	netfs: Rename CONFIG_FSCACHE_DEBUG to CONFIG_NETFS_DEBUG CONFIG_FSCACHE_DEBUG should have been renamed to CONFIG_NETFS_DEBUG, so do that now. Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/1410796.1721333406@warthog.procyon.org.uk cc: Uwe Kleine-König <ukleinek@kernel.org> cc: Christian Brauner <brauner@kernel.org> cc: Jeff Layton <jlayton@kernel.org> cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-07-24 10:15:38 +02:00
David Howells	a9d47a50cf	netfs: Revert "netfs: Switch debug logging to pr_debug()" Revert commit 163eae0fb0d4c610c59a8de38040f8e12f89fd43 to get back the original operation of the debugging macros. Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/20240608151352.22860-2-ukleinek@kernel.org Link: https://lore.kernel.org/r/1410685.1721333252@warthog.procyon.org.uk cc: Uwe Kleine-König <ukleinek@kernel.org> cc: Christian Brauner <brauner@kernel.org> cc: Jeff Layton <jlayton@kernel.org> cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-07-24 10:15:37 +02:00
Linus Torvalds	e9e969797b	execve fix for v6.11-rc1 - Move KUnit tests to tests/ subdirectory -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAmagDLAACgkQiXL039xt wCaNChAAqV0R2sK9/C3T2X0BEJo0pslZuPZBG7zEWPxF87jfjBC47Gt+xtwxXOzG ylMwyX8xOVE+rNhxqgRDgFSdWJ/Ypv4gKypwilFFeLglN4b+PtMOepOM9MXTBVym qi8+Yv02pUnEeEXxW+Bz8xR/eB6LAkKQEs5ZodzqK9CMQnduxCfnYbKfoDlDqAkY fl7Npy+P5EwD5YFC70rA7hd7sx83UE7+hqjYXfCoVDeMYoCJXPYwIPFrENnjyJYD WFiLYXbSMFigVY3BACSNrDLUtFzIr1kaicU11crKSRayGAzHkbYCKeEj1puxEUGW dhzSfrzqcb1qYjlZE+tDgwS9KVPAk802SAf6QggjC/emeE37pCnd2lDmdXnVTe8G HUhdNaGnBGICBUOOuqdYW3ei4NidWlLRhmyniidsoxL8t0iWe4wMJn6WmOH/XRTt S5C6it2cN00GvKLlFz4TcnosXkWmjsmii5aNVctXW6CjYJpgi64QkOkFJEtqeltZ Gv/KSBU/5OihvdTLHEJTMOBGbH4NTi1D1VYZR2GlRvhy0/KHTxeDfKARTJNra4jR 76NAQlNqG1Bi50lm8Zb489WqlW38ua3rjK9zgrPSR739zN+H6jFyR57CEA7G1gX5 hlKJkrBEBalydH6dCt2qP3T9IlhB/bM/cx8wXnwK5sHVv1Kh1KI= =Kl2W -----END PGP SIGNATURE----- Merge tag 'execve-v6.11-rc1-fix1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull execve fix from Kees Cook: "This moves the exec and binfmt_elf tests out of your way and into the tests/ subdirectory, following the newly ratified KUnit naming conventions. :)" * tag 'execve-v6.11-rc1-fix1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: execve: Move KUnit tests to tests/ subdirectory	2024-07-23 17:30:42 -07:00
Linus Torvalds	5ad7ff8738	f2fs update for 6.11-rc1 It's a pretty small update including mostly minor bug fixes in zoned storage along with the large section support. Enhancement: - add support for FS_IOC_GETFSSYSFSPATH - enable atgc dynamically if conditions are met - use new ioprio Macro to get ckpt thread ioprio level - remove unreachable lazytime mount option parsing Bug fix: - fix null reference error when checking end of zone - fix start segno of large section - fix to cover read extent cache access with lock - don't dirty inode for readonly filesystem - allocate a new section if curseg is not the first seg in its zone - only fragment segment in the same section - truncate preallocated blocks in f2fs_file_open() - fix to avoid use SSR allocate when do defragment - fix to force buffered IO on inline_data inode And, it includes some minor code clean-ups, and sanity checks. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE00UqedjCtOrGVvQiQBSofoJIUNIFAmagGLkACgkQQBSofoJI UNLsvA//U1u2hr+VEmSIxZ+CcM8vBM7wmbuggdUikEW0uj07YpvovLikifV7p6kK 00p/GsqIqNRsVcTxRI9wBTPiltJRei/w6K3EXnSGKgTPtq1QMSv/GKiBUUaYsRu0 F6W5AqouTquDZz61/ULhMc7WvWqUIZ1m4QX/DMEUGPSnQ2+yIsnz/PT4ZXaKBH7K lIh4WiFAyKO6/UWftcGmnvPiqj4YvqFOhLLV/fgF/VY8IVcENrDH+8+SJM2NtT0F 6gT0bN2Jscc8o43ejo6dlwc7+0qhmH7H2IOCC1XSYGCsveUYgqgKgpBP4ryKjZvt LrbYKaL+auGuJMcLYCG/6IDPl5xkJo3SuRE7YnJdeTNc3InC6BUr17pkmU8n5ib4 xKSeH2XQXk/nu3l9srtKb87Zdwjr90GgvjEZwsCTe+6ihjJ7SGWfpvVLhm3pHale SHPSLaVGqTlqdrNLtfhtNEg6xcvUVxTPbqzoCAmS6onEZfv8BldtQDSea0Tuw7UG Ic4AbfJ/gVCKyCDw/QiV0B1n8GHsVIhlBXss2/xEuO2/2Pso8YFIAXCyH0kBXIN2 0/VesfguJLBIGyyFZ2M5AGZehr5s1n2IThe+qGjeoHfNQz7Br+xBTc25VpowUenC nET3UoAmUkLFrItDMMqJbJ8DwW/Idei+YH/xnDZSKkz5rgHclsg= =4m67 -----END PGP SIGNATURE----- Merge tag 'f2fs-for-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "A pretty small update including mostly minor bug fixes in zoned storage along with the large section support. Enhancements: - add support for FS_IOC_GETFSSYSFSPATH - enable atgc dynamically if conditions are met - use new ioprio Macro to get ckpt thread ioprio level - remove unreachable lazytime mount option parsing Bug fixes: - fix null reference error when checking end of zone - fix start segno of large section - fix to cover read extent cache access with lock - don't dirty inode for readonly filesystem - allocate a new section if curseg is not the first seg in its zone - only fragment segment in the same section - truncate preallocated blocks in f2fs_file_open() - fix to avoid use SSR allocate when do defragment - fix to force buffered IO on inline_data inode And some minor code clean-ups and sanity checks" * tag 'f2fs-for-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (26 commits) f2fs: clean up addrs_per_{inode,block}() f2fs: clean up F2FS_I() f2fs: use meta inode for GC of COW file f2fs: use meta inode for GC of atomic file f2fs: only fragment segment in the same section f2fs: fix to update user block counts in block_operations() f2fs: remove unreachable lazytime mount option parsing f2fs: fix null reference error when checking end of zone f2fs: fix start segno of large section f2fs: remove redundant sanity check in sanity_check_inode() f2fs: assign CURSEG_ALL_DATA_ATGC if blkaddr is valid f2fs: fix to use mnt_{want,drop}_write_file replace file_{start,end}_wrtie f2fs: clean up set REQ_RAHEAD given rac f2fs: enable atgc dynamically if conditions are met f2fs: fix to truncate preallocated blocks in f2fs_file_open() f2fs: fix to cover read extent cache access with lock f2fs: fix return value of f2fs_convert_inline_inode() f2fs: use new ioprio Macro to get ckpt thread ioprio level f2fs: fix to don't dirty inode for readonly filesystem f2fs: fix to avoid use SSR allocate when do defragment ...	2024-07-23 15:21:19 -07:00
Linus Torvalds	371c141464	Folio conversion from Matthew Wilcox and a few various fixes -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEIodevzQLVs53l6BhNqiEXrVAjGQFAmaeu2MACgkQNqiEXrVA jGQobw//Vx99neGOazZEF3WLiD5dtOkOLwYVj9rIuMznXdxo9dwhkdjZNvzA8KYH 2P2RvDKBO3PmkACbPqcnbwkAbbeWuTjxaUFPmzTHvKFRk+Yrjwo4AIFykG5cUcDb IBkIAJB1QX5JcpGbvrGvzAm2R5/u+JkzA1HN87c7I4wwfAd3AiWpKveYGNW36hvW q7mlp4oy5UAkhj7cKc3nMaJ8n3D0xFczTvGIUGrpZWE4RrGivgRq++vfwOmO2uor DGNqu1f+ExK3GhU/4NoO6cxcTGEys43ubre7HjqPJC9BydTHz/0iBfy71XPBH+er se0JvfREVs+hixq6lZqYb1AT0FDG3eghxScXBhIm6oz4DOP9MKSpRdy7m+C93c6I UKvy8CD/UElltUpqfY0uqpcAyf7XKeRhitipkpPRm5ogkZ3HUFXq8o85SLZFSxPq GTDXT1JT8Ze02AhTvmLYasTJCBTmXQ+iW/aKl0Z5aPNoQTaRQmWrrpnfRx64tqEG 53qKi5II56IXdwDe4TSyeyJVzI7diOTe3EqZEzddJZAFo1bFhIKA6jJhPgZF0eCj MW98t+cJ3Z6CSLfU09usQxxRFY4BF7PBsj0wR8S/aU/hDyTDMUBZB4ypmtlYDGk8 Hvm64OdTpAx5kA4APZiAcZ/vTIoP/NXsN5GH1ogwPZFUWNQqHwY= =wYNC -----END PGP SIGNATURE----- Merge tag 'jfs-6.11' of github.com:kleikamp/linux-shaggy Pull jfs updates from David Kleikamp: "Folio conversion from Matthew Wilcox and a few various fixes" * tag 'jfs-6.11' of github.com:kleikamp/linux-shaggy: jfs: don't walk off the end of ealist jfs: Fix shift-out-of-bounds in dbDiscardAG jfs: Fix array-index-out-of-bounds in diFree jfs: fix null ptr deref in dtInsertEntry jfs: Remove use of folio error flag fs: Remove i_blocks_per_page jfs: Change metapage->page to metapage->folio jfs: Convert force_metapage to use a folio jfs: Convert inc_io to take a folio jfs: Convert page_to_mp to folio_to_mp jfs; Convert __invalidate_metapages to use a folio jfs: Convert dec_io to take a folio jfs: Convert drop_metapage and remove_metapage to take a folio jfs; Convert release_metapage to use a folio jfs: Convert insert_metapage() to take a folio jfs: Convert __get_metapage to use a folio jfs: Convert metapage_writepage to metapage_write_folio jfs: Convert metapage_read_folio to use folio APIs	2024-07-23 15:15:16 -07:00
Linus Torvalds	ca83c61cb3	Kbuild updates for v6.11 - Remove tristate choice support from Kconfig - Stop using the PROVIDE() directive in the linker script - Reduce the number of links for the combination of CONFIG_DEBUG_INFO_BTF and CONFIG_KALLSYMS - Enable the warning for symbol reference to .exit.* sections by default - Fix warnings in RPM package builds - Improve scripts/make_fit.py to generate a FIT image with separate base DTB and overlays - Improve choice value calculation in Kconfig - Fix conditional prompt behavior in choice in Kconfig - Remove support for the uncommon EMAIL environment variable in Debian package builds - Remove support for the uncommon "name <email>" form for the DEBEMAIL environment variable - Raise the minimum supported GNU Make version to 4.0 - Remove stale code for the absolute kallsyms - Move header files commonly used for host programs to scripts/include/ - Introduce the pacman-pkg target to generate a pacman package used in Arch Linux - Clean up Kconfig -----BEGIN PGP SIGNATURE----- iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAmagBLUVHG1hc2FoaXJv eUBrZXJuZWwub3JnAAoJED2LAQed4NsGmoUQAJ8pnURs0g+Rcyk6bdY/qtXBYkS+ nXpIK1ssFgRRgAQdeszYtvBqLFzb0wRCSie87G1AriD/JkVVTjCCY1For1y+vs0u a7HfxitHhZpPyZW/T+WMQ3LViNccpkx+DFAcoRH8xOY/XPEJKVUby332jOIXMuyg +NKIELQJVsLhcDofTUGb5VfIQektw219n5c4jKjXdNk4ZtE24xCRM5X528ZebwWJ RZhMvJ968PyIH1IRXvNt6dsKBxoGIwPP8IO6yW9hzHaNsBqt7MGSChSel7r1VKpk iwCNApJvEiVBe5wvTSVOVro7/8p/AZ70CQAqnMJV+dNnRqtGqW7NvL6XAjZRJgJJ Uxe5NSrXgQd3FtqfcbXLetBgp9zGVt328nHm1HXHR5rFsvoOiTvO7hHPbhA+OoWJ fs+jHzEXdAMRgsNrczPWU5Svq6MgGe4v8HBf0m8N1Uy65t/O+z9ti2QAw7kIFlbu /VSFNjw4CHmNxGhnH0khCMsy85FwVIt9Ux+2d6IEc0gP8S1Qa1HgHGAoVI4U51eS 9dxEPVJNPOugaIVHheuS3wimEO6wzaJcQHn4IXaasMA7P6Yo4G/jiGoy4cb9qPTM Hb+GaOltUy7vDoG4D2LSym8zR8rdKwbIf/5psdZrq/IWVKq5p+p7KWs3aOykSoM7 o6Hb532Ioalhm8je =BYu7 -----END PGP SIGNATURE----- Merge tag 'kbuild-v6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild updates from Masahiro Yamada: - Remove tristate choice support from Kconfig - Stop using the PROVIDE() directive in the linker script - Reduce the number of links for the combination of CONFIG_KALLSYMS and CONFIG_DEBUG_INFO_BTF - Enable the warning for symbol reference to .exit.* sections by default - Fix warnings in RPM package builds - Improve scripts/make_fit.py to generate a FIT image with separate base DTB and overlays - Improve choice value calculation in Kconfig - Fix conditional prompt behavior in choice in Kconfig - Remove support for the uncommon EMAIL environment variable in Debian package builds - Remove support for the uncommon "name <email>" form for the DEBEMAIL environment variable - Raise the minimum supported GNU Make version to 4.0 - Remove stale code for the absolute kallsyms - Move header files commonly used for host programs to scripts/include/ - Introduce the pacman-pkg target to generate a pacman package used in Arch Linux - Clean up Kconfig * tag 'kbuild-v6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (65 commits) kbuild: doc: gcc to CC change kallsyms: change sym_entry::percpu_absolute to bool type kallsyms: unify seq and start_pos fields of struct sym_entry kallsyms: add more original symbol type/name in comment lines kallsyms: use \t instead of a tab in printf() kallsyms: avoid repeated calculation of array size for markers kbuild: add script and target to generate pacman package modpost: use generic macros for hash table implementation kbuild: move some helper headers from scripts/kconfig/ to scripts/include/ Makefile: add comment to discourage tools/* addition for kernel builds kbuild: clean up scripts/remove-stale-files kconfig: recursive checks drop file/lineno kbuild: rpm-pkg: introduce a simple changelog section for kernel.spec kallsyms: get rid of code for absolute kallsyms kbuild: Create INSTALL_PATH directory if it does not exist kbuild: Abort make on install failures kconfig: remove 'e1' and 'e2' macros from expression deduplication kconfig: remove SYMBOL_CHOICEVAL flag kconfig: add const qualifiers to several function arguments kconfig: call expr_eliminate_yn() at least once in expr_eliminate_dups() ...	2024-07-23 14:32:21 -07:00
Kees Cook	b6f5ee4d53	execve: Move KUnit tests to tests/ subdirectory Move the exec KUnit tests into a separate directory to avoid polluting the local directory namespace. Additionally update MAINTAINERS for the new files. Reviewed-by: David Gow <davidgow@google.com> Reviewed-by: SeongJae Park <sj@kernel.org> Acked-by: Christian Brauner <brauner@kernel.org> Link: https://lore.kernel.org/r/20240720170310.it.942-kees@kernel.org Signed-off-by: Kees Cook <kees@kernel.org>	2024-07-22 18:25:47 -07:00
Linus Torvalds	dd018c238b	bcachefs fixes for 6.11-rc1 - another fix for fsck getting stuck, from marcin - small syzbot fix - another undefined shift fix -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEKnAFLkS8Qha+jvQrE6szbY3KbnYFAmaekrUACgkQE6szbY3K bnZoLg//Sbdo0JsUJIDU3gnmyEMmWx1mvUT7MJS2EwwLpLR2t1oNcE5CB9aNdv2d fhJxjhuBnc9Z1yO83eQSinUcMGdpM3QIS3BH1dwMa2kY5jE0cfxvdoXX+2CDVzPe 6/0SgX+0Ce2X7MxQSI8Nu9RhNWtEwFqdtOoOanBWHzjBPNDC5+ZuocZAnqMyM0cu KYmtZ1lyGwa23ILwaKWuZopXj8jd62FU+X49PWpzLvOLM+1BWwYqOrL0ZgNokkLc LoalSWmdVDTXCs6dteDO++nwLoPJbcYgwv7fbB6yiHj0xs/bwRgu0FHVzd/1tjMM O8VY/9/el+EbVE2VkA6JMHe6IMRJS6Gf6c78MfwwPPtcOyinM8AVlLNa1WLH6Sjw XfCKzENnM6CHSfefY3IcYrKHQ3htdZNw7nvnz2PTwP8KejHBkOgmde3Dqhn6khKa R3U3nR9hc5lOvN6Y7EuLmw/tLvab6NjNdm5xSIo9Tpg2GjpsnITZL7hSKOAG1ua/ 7JxWGND+SrDgU/bEv+kRsOTRDgx81uOMQ1IUMmW5CwFPj61K1X3q4SBwjxeopC3Q CQ9IpkK/twLai8ENSy37HqeSzqbLCsJFILR3q68SlyE7KSuGFdR6ySHX0NYQFY1L TDTJcajUB0O23xlL7WEIyeH3pGx6+adS5YYsk0dR9s5o7UEn84g= =CKAY -----END PGP SIGNATURE----- Merge tag 'bcachefs-2024-07-22' of https://evilpiepirate.org/git/bcachefs Pull bcachefs fixes from Kent Overstreet: - another fix for fsck getting stuck, from marcin - small syzbot fix - another undefined shift fix * tag 'bcachefs-2024-07-22' of https://evilpiepirate.org/git/bcachefs: bcachefs: Fix printbuf usage while atomic bcachefs: More informative error message in reattach_inode() bcachefs: kill btree_trans_too_many_iters() in bch2_bucket_alloc_freelist() bcachefs: mean_and_variance: Avoid too-large shift amounts	2024-07-22 10:59:08 -07:00
Linus Torvalds	5ea6d72489	ntfs3 changes for 6.11-rc1 Added: simple fileattr has been implement. Fixed: transform resident to nonresident for compressed files; the format of the "nocase" mount option; getting file type; many other internal bugs. Refactored: unused function and macros have been removed; partial transition from page to folio (suggested by Matthew Wilcox); legacy ntfs support. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEh0DEKNP0I9IjwfWEqbAzH4MkB7YFAmaeJa0ACgkQqbAzH4Mk B7b0AQ/7BboEmiLXjgstyF6ZQcjT3NamN09OLOTu2SMlGhqcOjIn1xNNMCJPRdVR KJ+/LG1jYMqdyjsR7n5jt8Bxn4njLInPD3+X89NcKPVJxWFYQvg/5pzZw5YF3XGA G9f0ky1nvSrTLCTWuGktfkFJ5xBg9J7qkvnjiUsXUtPrf6SS2hH5z2uVbKMOfua9 xSoe76hrxVxfpYTfk2ERtS8b1gcQUnT/JrAO88yboNQ8i4IVqdxYUklhHoXZnUHr YZHcg04SD9+oqrK5eR2xx2belBEVPVoD0BJZkLQyNNqBfVwNzPXc9fkkpyc9Co5R duRyXPzc9qYkFVCsGY9N73F+2lPmruGcz5TYgn1EOZUb00dOdZd2LF/J96lRmwpC hICtwbNgMWp2a9jAC1G0lNhmvHHiFZgR5o7N3YmkMUaCCcT1Z3jeCCQ2JGE0cNCf lwaMda6jxl1f7QLUP53qlgTb+RlVbVpPFD0JQ7mEeZyRuZepgsk6xkHQ2G18nicQ hmui6t+tQYMgoosjOAy8rlTtcK5A3gKUbYyeStCUNVANEM6XPUpjF0lRQuwN2bae p/nTjLBESp6ulq1MIFZ2h6G3OlX+vkezHULo80QWfoIEuMtkZ+MyypmNUDqlBE5n pDyE8bcqCVI4ODGzIj8aoZSDggQYM49SqDzmoNc7n2hhFkS8Drk= =sA5e -----END PGP SIGNATURE----- Merge tag 'ntfs3_for_6.11' of https://github.com/Paragon-Software-Group/linux-ntfs3 Pull ntfs3 updates from Konstantin Komarov: "New code: - simple fileattr support Fixes: - transform resident to nonresident for compressed files - the format of the "nocase" mount option - getting file type - many other internal bugs Refactoring: - remove unused functions and macros - partial transition from page to folio (suggested by Matthew Wilcox) - legacy ntfs support" * tag 'ntfs3_for_6.11' of https://github.com/Paragon-Software-Group/linux-ntfs3: (42 commits) fs/ntfs3: Fix formatting, change comments, renaming fs/ntfs3: Update log->page_{mask,bits} if log->page_size changed fs/ntfs3: Implement simple fileattr fs/ntfs3: Redesign legacy ntfs support fs/ntfs3: Use function file_inode to get inode from file fs/ntfs3: Minor ntfs_list_ea refactoring fs/ntfs3: Check more cases when directory is corrupted fs/ntfs3: Do copy_to_user out of run_lock fs/ntfs3: Keep runs for $MFT::$ATTR_DATA and $MFT::$ATTR_BITMAP fs/ntfs3: Missed error return fs/ntfs3: Fix the format of the "nocase" mount option fs/ntfs3: Fix field-spanning write in INDEX_HDR ntfs3: Convert attr_wof_frame_info() to use a folio ntfs3: Convert ni_readpage_cmpr() to take a folio ntfs3: Convert ntfs_get_frame_pages() to use a folio ntfs3: Remove calls to set/clear the error flag ntfs3: Convert attr_make_nonresident to use a folio ntfs3: Convert attr_data_write_resident to use a folio ntfs3: Convert ntfs_write_end() to work on a folio ntfs3: Convert attr_data_read_resident() to take a folio ...	2024-07-22 10:50:18 -07:00
Kent Overstreet	737759fc09	bcachefs: Fix printbuf usage while atomic Reported-by: syzbot+f765e51170cf13493f0b@syzkaller.appspotmail.com Fixes: f12410bb7ddd ("bcachefs: Add an error message for insufficient rw journal devs") Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-22 11:27:15 -04:00
Kent Overstreet	7a086baad0	bcachefs: More informative error message in reattach_inode() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-22 11:27:15 -04:00
Linus Torvalds	933069701c	four ksmbd server fixes -----BEGIN PGP SIGNATURE----- iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmadTDUACgkQiiy9cAdy T1Hzugv/UTw9ERSzZNtYOOuM+5EtvYxqxGLiGaaVbQaGzDoNW5hgfIoWwvllaPHP 4lmHH2Nsz0B2Cg0fSKBbTWZ7pxQ4QUuCuwhgcKVZyYnuikf1qSMPgOBb5T2JkuTG qu0GX+dFdoak6RiLZ8vSfUsQ1IzvuyLcXrPDdvwfE/eV3NKGLM8CevkpULSNGKwz P2vpOu9oN0fhrHP8rXWRrNCLma4056TYFYDRpRqWxiTJr12JvXmOyjlovmEBx12K H1plz3ltLQcFj5w0dnYSAY8jijEICITeNBxD0aP6pQ6Ah2C1pUEES2Lr2JG/OYt0 O4nkUGpbWShi70rCTnWbXOWQU7mbmtSqhxob0Z6wUdrHRZUUoWLr3WQaIHJHfOmY 5UgiHoiiV98wtBkrja/Ex/O9GdOKpdEVlM9M3wJR9D6YAeZSYKB2rLweGs6QtgrU HRFCNZmJM0zPpsT2SUQDanOiODShAqoGcPQgBuEAVhs4TqQz2rTlPTrodhXNI5WF RJKin/uq =CggG -----END PGP SIGNATURE----- Merge tag '6.11-rc-smb3-server-fixes' of git://git.samba.org/ksmbd Pull smb server fixes from Steve French: - two durable handle improvements - two small cleanup patches * tag '6.11-rc-smb3-server-fixes' of git://git.samba.org/ksmbd: ksmbd: add durable scavenger timer ksmbd: avoid reclaiming expired durable opens by the client ksmbd: Constify struct ksmbd_transport_ops ksmbd: remove duplicate SMB2 Oplock levels definitions	2024-07-21 20:50:39 -07:00
Linus Torvalds	527eff227d	- In the series "treewide: Refactor heap related implementation", Kuan-Wei Chiu has significantly reworked the min_heap library code and has taught bcachefs to use the new more generic implementation. - Yury Norov's series "Cleanup cpumask.h inclusion in core headers" reworks the cpumask and nodemask headers to make things generally more rational. - Kuan-Wei Chiu has sent along some maintenance work against our sorting library code in the series "lib/sort: Optimizations and cleanups". - More library maintainance work from Christophe Jaillet in the series "Remove usage of the deprecated ida_simple_xx() API". - Ryusuke Konishi continues with the nilfs2 fixes and clanups in the series "nilfs2: eliminate the call to inode_attach_wb()". - Kuan-Ying Lee has some fixes to the gdb scripts in the series "Fix GDB command error". - Plus the usual shower of singleton patches all over the place. Please see the relevant changelogs for details. -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZp2GvwAKCRDdBJ7gKXxA jlf/AP48xP5ilIHbtpAKm2z+MvGuTxJQ5VSC0UXFacuCbc93lAEA+Yo+vOVRmh6j fQF2nVKyKLYfSz7yqmCyAaHWohIYLgg= =Stxz -----END PGP SIGNATURE----- Merge tag 'mm-nonmm-stable-2024-07-21-15-07' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: - In the series "treewide: Refactor heap related implementation", Kuan-Wei Chiu has significantly reworked the min_heap library code and has taught bcachefs to use the new more generic implementation. - Yury Norov's series "Cleanup cpumask.h inclusion in core headers" reworks the cpumask and nodemask headers to make things generally more rational. - Kuan-Wei Chiu has sent along some maintenance work against our sorting library code in the series "lib/sort: Optimizations and cleanups". - More library maintainance work from Christophe Jaillet in the series "Remove usage of the deprecated ida_simple_xx() API". - Ryusuke Konishi continues with the nilfs2 fixes and clanups in the series "nilfs2: eliminate the call to inode_attach_wb()". - Kuan-Ying Lee has some fixes to the gdb scripts in the series "Fix GDB command error". - Plus the usual shower of singleton patches all over the place. Please see the relevant changelogs for details. * tag 'mm-nonmm-stable-2024-07-21-15-07' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (98 commits) ia64: scrub ia64 from poison.h watchdog/perf: properly initialize the turbo mode timestamp and rearm counter tsacct: replace strncpy() with strscpy() lib/bch.c: use swap() to improve code test_bpf: convert comma to semicolon init/modpost: conditionally check section mismatch to __meminit* init: remove unused __MEMINIT* macros nilfs2: Constify struct kobj_type nilfs2: avoid undefined behavior in nilfs_cnt32_ge macro math: rational: add missing MODULE_DESCRIPTION() macro lib/zlib: add missing MODULE_DESCRIPTION() macro fs: ufs: add MODULE_DESCRIPTION() lib/rbtree.c: fix the example typo ocfs2: add bounds checking to ocfs2_check_dir_entry() fs: add kernel-doc comments to ocfs2_prepare_orphan_dir() coredump: simplify zap_process() selftests/fpu: add missing MODULE_DESCRIPTION() macro compiler.h: simplify data_race() macro build-id: require program headers to be right after ELF header resource: add missing MODULE_DESCRIPTION() ...	2024-07-21 17:56:22 -07:00
Linus Torvalds	fbc90c042c	- 875fa64577da ("mm/hugetlb_vmemmap: fix race with speculative PFN walkers") is known to cause a performance regression (https://lore.kernel.org/all/3acefad9-96e5-4681-8014-827d6be71c7a@linux.ibm.com/T/#mfa809800a7862fb5bdf834c6f71a3a5113eb83ff). Yu has a fix which I'll send along later via the hotfixes branch. - In the series "mm: Avoid possible overflows in dirty throttling" Jan Kara addresses a couple of issues in the writeback throttling code. These fixes are also targetted at -stable kernels. - Ryusuke Konishi's series "nilfs2: fix potential issues related to reserved inodes" does that. This should actually be in the mm-nonmm-stable tree, along with the many other nilfs2 patches. My bad. - More folio conversions from Kefeng Wang in the series "mm: convert to folio_alloc_mpol()" - Kemeng Shi has sent some cleanups to the writeback code in the series "Add helper functions to remove repeated code and improve readability of cgroup writeback" - Kairui Song has made the swap code a little smaller and a little faster in the series "mm/swap: clean up and optimize swap cache index". - In the series "mm/memory: cleanly support zeropage in vm_insert_page(), vm_map_pages() and vmf_insert_mixed()" David Hildenbrand has reworked the rather sketchy handling of the use of the zeropage in MAP_SHARED mappings. I don't see any runtime effects here - more a cleanup/understandability/maintainablity thing. - Dev Jain has improved selftests/mm/va_high_addr_switch.c's handling of higher addresses, for aarch64. The (poorly named) series is "Restructure va_high_addr_switch". - The core TLB handling code gets some cleanups and possible slight optimizations in Bang Li's series "Add update_mmu_tlb_range() to simplify code". - Jane Chu has improved the handling of our fake-an-unrecoverable-memory-error testing feature MADV_HWPOISON in the series "Enhance soft hwpoison handling and injection". - Jeff Johnson has sent a billion patches everywhere to add MODULE_DESCRIPTION() to everything. Some landed in this pull. - In the series "mm: cleanup MIGRATE_SYNC_NO_COPY mode", Kefeng Wang has simplified migration's use of hardware-offload memory copying. - Yosry Ahmed performs more folio API conversions in his series "mm: zswap: trivial folio conversions". - In the series "large folios swap-in: handle refault cases first", Chuanhua Han inches us forward in the handling of large pages in the swap code. This is a cleanup and optimization, working toward the end objective of full support of large folio swapin/out. - In the series "mm,swap: cleanup VMA based swap readahead window calculation", Huang Ying has contributed some cleanups and a possible fixlet to his VMA based swap readahead code. - In the series "add mTHP support for anonymous shmem" Baolin Wang has taught anonymous shmem mappings to use multisize THP. By default this is a no-op - users must opt in vis sysfs controls. Dramatic improvements in pagefault latency are realized. - David Hildenbrand has some cleanups to our remaining use of page_mapcount() in the series "fs/proc: move page_mapcount() to fs/proc/internal.h". - David also has some highmem accounting cleanups in the series "mm/highmem: don't track highmem pages manually". - Build-time fixes and cleanups from John Hubbard in the series "cleanups, fixes, and progress towards avoiding "make headers"". - Cleanups and consolidation of the core pagemap handling from Barry Song in the series "mm: introduce pmd\|pte_needs_soft_dirty_wp helpers and utilize them". - Lance Yang's series "Reclaim lazyfree THP without splitting" has reduced the latency of the reclaim of pmd-mapped THPs under fairly common circumstances. A 10x speedup is seen in a microbenchmark. It does this by punting to aother CPU but I guess that's a win unless all CPUs are pegged. - hugetlb_cgroup cleanups from Xiu Jianfeng in the series "mm/hugetlb_cgroup: rework on cftypes". - Miaohe Lin's series "Some cleanups for memory-failure" does just that thing. - Is anyone reading this stuff? If so, email me! - Someone other than SeongJae has developed a DAMON feature in Honggyu Kim's series "DAMON based tiered memory management for CXL memory". This adds DAMON features which may be used to help determine the efficiency of our placement of CXL/PCIe attached DRAM. - DAMON user API centralization and simplificatio work in SeongJae Park's series "mm/damon: introduce DAMON parameters online commit function". - In the series "mm: page_type, zsmalloc and page_mapcount_reset()" David Hildenbrand does some maintenance work on zsmalloc - partially modernizing its use of pageframe fields. - Kefeng Wang provides more folio conversions in the series "mm: remove page_maybe_dma_pinned() and page_mkclean()". - More cleanup from David Hildenbrand, this time in the series "mm/memory_hotplug: use PageOffline() instead of PageReserved() for !ZONE_DEVICE". It "enlightens memory hotplug more about PageOffline() pages" and permits the removal of some virtio-mem hacks. - Barry Song's series "mm: clarify folio_add_new_anon_rmap() and __folio_add_anon_rmap()" is a cleanup to the anon folio handling in preparation for mTHP (multisize THP) swapin. - Kefeng Wang's series "mm: improve clear and copy user folio" implements more folio conversions, this time in the area of large folio userspace copying. - The series "Docs/mm/damon/maintaier-profile: document a mailing tool and community meetup series" tells people how to get better involved with other DAMON developers. From SeongJae Park. - A large series ("kmsan: Enable on s390") from Ilya Leoshkevich does that. - David Hildenbrand sends along more cleanups, this time against the migration code. The series is "mm/migrate: move NUMA hinting fault folio isolation + checks under PTL". - Jan Kara has found quite a lot of strangenesses and minor errors in the readahead code. He addresses this in the series "mm: Fix various readahead quirks". - SeongJae Park's series "selftests/damon: test DAMOS tried regions and {min,max}_nr_regions" adds features and addresses errors in DAMON's self testing code. - Gavin Shan has found a userspace-triggerable WARN in the pagecache code. The series "mm/filemap: Limit page cache size to that supported by xarray" addresses this. The series is marked cc:stable. - Chengming Zhou's series "mm/ksm: cmp_and_merge_page() optimizations and cleanup" cleans up and slightly optimizes KSM. - Roman Gushchin has separated the memcg-v1 and memcg-v2 code - lots of code motion. The series (which also makes the memcg-v1 code Kconfigurable) are "mm: memcg: separate legacy cgroup v1 code and put under config option" and "mm: memcg: put cgroup v1-specific memcg data under CONFIG_MEMCG_V1" - Dan Schatzberg's series "Add swappiness argument to memory.reclaim" adds an additional feature to this cgroup-v2 control file. - The series "Userspace controls soft-offline pages" from Jiaqi Yan permits userspace to stop the kernel's automatic treatment of excessive correctable memory errors. In order to permit userspace to monitor and handle this situation. - Kefeng Wang's series "mm: migrate: support poison recover from migrate folio" teaches the kernel to appropriately handle migration from poisoned source folios rather than simply panicing. - SeongJae Park's series "Docs/damon: minor fixups and improvements" does those things. - In the series "mm/zsmalloc: change back to per-size_class lock" Chengming Zhou improves zsmalloc's scalability and memory utilization. - Vivek Kasireddy's series "mm/gup: Introduce memfd_pin_folios() for pinning memfd folios" makes the GUP code use FOLL_PIN rather than bare refcount increments. So these paes can first be moved aside if they reside in the movable zone or a CMA block. - Andrii Nakryiko has added a binary ioctl()-based API to /proc/pid/maps for much faster reading of vma information. The series is "query VMAs from /proc/<pid>/maps". - In the series "mm: introduce per-order mTHP split counters" Lance Yang improves the kernel's presentation of developer information related to multisize THP splitting. - Michael Ellerman has developed the series "Reimplement huge pages without hugepd on powerpc (8xx, e500, book3s/64)". This permits userspace to use all available huge page sizes. - In the series "revert unconditional slab and page allocator fault injection calls" Vlastimil Babka removes a performance-affecting and not very useful feature from slab fault injection. -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZp2C+QAKCRDdBJ7gKXxA joTkAQDvjqOoFStqk4GU3OXMYB7WCU/ZQMFG0iuu1EEwTVDZ4QEA8CnG7seek1R3 xEoo+vw0sWWeLV3qzsxnCA1BJ8cTJA8= =z0Lf -----END PGP SIGNATURE----- Merge tag 'mm-stable-2024-07-21-14-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - In the series "mm: Avoid possible overflows in dirty throttling" Jan Kara addresses a couple of issues in the writeback throttling code. These fixes are also targetted at -stable kernels. - Ryusuke Konishi's series "nilfs2: fix potential issues related to reserved inodes" does that. This should actually be in the mm-nonmm-stable tree, along with the many other nilfs2 patches. My bad. - More folio conversions from Kefeng Wang in the series "mm: convert to folio_alloc_mpol()" - Kemeng Shi has sent some cleanups to the writeback code in the series "Add helper functions to remove repeated code and improve readability of cgroup writeback" - Kairui Song has made the swap code a little smaller and a little faster in the series "mm/swap: clean up and optimize swap cache index". - In the series "mm/memory: cleanly support zeropage in vm_insert_page(), vm_map_pages() and vmf_insert_mixed()" David Hildenbrand has reworked the rather sketchy handling of the use of the zeropage in MAP_SHARED mappings. I don't see any runtime effects here - more a cleanup/understandability/maintainablity thing. - Dev Jain has improved selftests/mm/va_high_addr_switch.c's handling of higher addresses, for aarch64. The (poorly named) series is "Restructure va_high_addr_switch". - The core TLB handling code gets some cleanups and possible slight optimizations in Bang Li's series "Add update_mmu_tlb_range() to simplify code". - Jane Chu has improved the handling of our fake-an-unrecoverable-memory-error testing feature MADV_HWPOISON in the series "Enhance soft hwpoison handling and injection". - Jeff Johnson has sent a billion patches everywhere to add MODULE_DESCRIPTION() to everything. Some landed in this pull. - In the series "mm: cleanup MIGRATE_SYNC_NO_COPY mode", Kefeng Wang has simplified migration's use of hardware-offload memory copying. - Yosry Ahmed performs more folio API conversions in his series "mm: zswap: trivial folio conversions". - In the series "large folios swap-in: handle refault cases first", Chuanhua Han inches us forward in the handling of large pages in the swap code. This is a cleanup and optimization, working toward the end objective of full support of large folio swapin/out. - In the series "mm,swap: cleanup VMA based swap readahead window calculation", Huang Ying has contributed some cleanups and a possible fixlet to his VMA based swap readahead code. - In the series "add mTHP support for anonymous shmem" Baolin Wang has taught anonymous shmem mappings to use multisize THP. By default this is a no-op - users must opt in vis sysfs controls. Dramatic improvements in pagefault latency are realized. - David Hildenbrand has some cleanups to our remaining use of page_mapcount() in the series "fs/proc: move page_mapcount() to fs/proc/internal.h". - David also has some highmem accounting cleanups in the series "mm/highmem: don't track highmem pages manually". - Build-time fixes and cleanups from John Hubbard in the series "cleanups, fixes, and progress towards avoiding "make headers"". - Cleanups and consolidation of the core pagemap handling from Barry Song in the series "mm: introduce pmd\|pte_needs_soft_dirty_wp helpers and utilize them". - Lance Yang's series "Reclaim lazyfree THP without splitting" has reduced the latency of the reclaim of pmd-mapped THPs under fairly common circumstances. A 10x speedup is seen in a microbenchmark. It does this by punting to aother CPU but I guess that's a win unless all CPUs are pegged. - hugetlb_cgroup cleanups from Xiu Jianfeng in the series "mm/hugetlb_cgroup: rework on cftypes". - Miaohe Lin's series "Some cleanups for memory-failure" does just that thing. - Someone other than SeongJae has developed a DAMON feature in Honggyu Kim's series "DAMON based tiered memory management for CXL memory". This adds DAMON features which may be used to help determine the efficiency of our placement of CXL/PCIe attached DRAM. - DAMON user API centralization and simplificatio work in SeongJae Park's series "mm/damon: introduce DAMON parameters online commit function". - In the series "mm: page_type, zsmalloc and page_mapcount_reset()" David Hildenbrand does some maintenance work on zsmalloc - partially modernizing its use of pageframe fields. - Kefeng Wang provides more folio conversions in the series "mm: remove page_maybe_dma_pinned() and page_mkclean()". - More cleanup from David Hildenbrand, this time in the series "mm/memory_hotplug: use PageOffline() instead of PageReserved() for !ZONE_DEVICE". It "enlightens memory hotplug more about PageOffline() pages" and permits the removal of some virtio-mem hacks. - Barry Song's series "mm: clarify folio_add_new_anon_rmap() and __folio_add_anon_rmap()" is a cleanup to the anon folio handling in preparation for mTHP (multisize THP) swapin. - Kefeng Wang's series "mm: improve clear and copy user folio" implements more folio conversions, this time in the area of large folio userspace copying. - The series "Docs/mm/damon/maintaier-profile: document a mailing tool and community meetup series" tells people how to get better involved with other DAMON developers. From SeongJae Park. - A large series ("kmsan: Enable on s390") from Ilya Leoshkevich does that. - David Hildenbrand sends along more cleanups, this time against the migration code. The series is "mm/migrate: move NUMA hinting fault folio isolation + checks under PTL". - Jan Kara has found quite a lot of strangenesses and minor errors in the readahead code. He addresses this in the series "mm: Fix various readahead quirks". - SeongJae Park's series "selftests/damon: test DAMOS tried regions and {min,max}_nr_regions" adds features and addresses errors in DAMON's self testing code. - Gavin Shan has found a userspace-triggerable WARN in the pagecache code. The series "mm/filemap: Limit page cache size to that supported by xarray" addresses this. The series is marked cc:stable. - Chengming Zhou's series "mm/ksm: cmp_and_merge_page() optimizations and cleanup" cleans up and slightly optimizes KSM. - Roman Gushchin has separated the memcg-v1 and memcg-v2 code - lots of code motion. The series (which also makes the memcg-v1 code Kconfigurable) are "mm: memcg: separate legacy cgroup v1 code and put under config option" and "mm: memcg: put cgroup v1-specific memcg data under CONFIG_MEMCG_V1" - Dan Schatzberg's series "Add swappiness argument to memory.reclaim" adds an additional feature to this cgroup-v2 control file. - The series "Userspace controls soft-offline pages" from Jiaqi Yan permits userspace to stop the kernel's automatic treatment of excessive correctable memory errors. In order to permit userspace to monitor and handle this situation. - Kefeng Wang's series "mm: migrate: support poison recover from migrate folio" teaches the kernel to appropriately handle migration from poisoned source folios rather than simply panicing. - SeongJae Park's series "Docs/damon: minor fixups and improvements" does those things. - In the series "mm/zsmalloc: change back to per-size_class lock" Chengming Zhou improves zsmalloc's scalability and memory utilization. - Vivek Kasireddy's series "mm/gup: Introduce memfd_pin_folios() for pinning memfd folios" makes the GUP code use FOLL_PIN rather than bare refcount increments. So these paes can first be moved aside if they reside in the movable zone or a CMA block. - Andrii Nakryiko has added a binary ioctl()-based API to /proc/pid/maps for much faster reading of vma information. The series is "query VMAs from /proc/<pid>/maps". - In the series "mm: introduce per-order mTHP split counters" Lance Yang improves the kernel's presentation of developer information related to multisize THP splitting. - Michael Ellerman has developed the series "Reimplement huge pages without hugepd on powerpc (8xx, e500, book3s/64)". This permits userspace to use all available huge page sizes. - In the series "revert unconditional slab and page allocator fault injection calls" Vlastimil Babka removes a performance-affecting and not very useful feature from slab fault injection. * tag 'mm-stable-2024-07-21-14-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (411 commits) mm/mglru: fix ineffective protection calculation mm/zswap: fix a white space issue mm/hugetlb: fix kernel NULL pointer dereference when migrating hugetlb folio mm/hugetlb: fix possible recursive locking detected warning mm/gup: clear the LRU flag of a page before adding to LRU batch mm/numa_balancing: teach mpol_to_str about the balancing mode mm: memcg1: convert charge move flags to unsigned long long alloc_tag: fix page_ext_get/page_ext_put sequence during page splitting lib: reuse page_ext_data() to obtain codetag_ref lib: add missing newline character in the warning message mm/mglru: fix overshooting shrinker memory mm/mglru: fix div-by-zero in vmpressure_calc_level() mm/kmemleak: replace strncpy() with strscpy() mm, page_alloc: put should_fail_alloc_page() back behing CONFIG_FAIL_PAGE_ALLOC mm, slab: put should_failslab() back behind CONFIG_SHOULD_FAILSLAB mm: ignore data-race in __swap_writepage hugetlbfs: ensure generic_hugetlb_get_unmapped_area() returns higher address than mmap_min_addr mm: shmem: rename mTHP shmem counters mm: swap_state: use folio_alloc_mpol() in __read_swap_cache_async() mm/migrate: putback split folios when numa hint migration fails ...	2024-07-21 17:15:46 -07:00
Linus Torvalds	33c9de2960	six smb3 client fixes, most for stable including important netfs fixes -----BEGIN PGP SIGNATURE----- iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmadThwACgkQiiy9cAdy T1FUgwv+IwziaaGLNdl6krOfF5z28bpltqjD2ijE3MD2doLrJeUwQ2EhrfF0Ye1M vj2w5Pi7j7zZLkRXm9D/yrB4CJqw4Zksj0BMRn4QWW2Cbrk8mG0MgaSz2K6tuhFx OQx+QLtvxjHvFB2AzHhEdjPqb3Sf5zsyXa8QtzQnLhPj/awikExACmIfYoeN65+V 8yWNB3LJLtBlHu3bMqyKDNa7gU7E2ctzUJ4vmTy7HrWMGPLuX/RIraI43zFSHhne I4gzyQiy/VZ4RCXUJac00FofLEsfKKxdFBKmRcSLHCxAVIjxzqSOFvF5qCL8N6aw 4DG0zV0B2xaCfvAOQrB/aWLDIRThUOcDCw6BOxXoHOE4EnxjiCaK1/79iPz2mx0y XdnIuvNYjUVM9fmfDn6VxCPcn3OpSgKTErTHWaqnLKGtw8EL+VbYdE/WCwt3DrY9 nUlzRwTCaBWeiQk7jjPyAwNiMqsMHa03ya07ElNnTlBJsM1LsZVwk7HnjS2crIuA gZXRFS+/ =9YeP -----END PGP SIGNATURE----- Merge tag '6.11-rc-part1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6 Pull smb client fixes from Steve French: "Six smb3 client fixes, most for stable including important netfs fixes: - various netfs related fixes for cifs addressing some regressions in 6.10 (e.g. generic/708 and some multichannel crediting related issues) - fix for a noisy log message on copy_file_range - add trace point for read/write credits" * tag '6.11-rc-part1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: cifs: Fix missing fscache invalidation cifs: Add a tracepoint to track credits involved in R/W requests cifs: Fix setting of zero_point after DIO write cifs: Fix missing error code set cifs: Fix server re-repick on subrequest retry cifs: fix noisy message on copy_file_range	2024-07-21 15:23:39 -07:00
David Howells	a07d38afd1	cifs: Fix missing fscache invalidation A network filesystem needs to implement a netfslib hook to invalidate fscache if it's to be able to use the cache. Fix cifs to implement the cache invalidation hook. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.com> cc: Jeff Layton <jlayton@kernel.org> cc: linux-cifs@vger.kernel.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Cc: stable@vger.kernel.org Fixes: 3ee1a1fc3981 ("cifs: Cut over to using netfslib") Signed-off-by: Steve French <stfrench@microsoft.com>	2024-07-20 13:55:29 -05:00
Linus Torvalds	53a5182c8a	for-6.11-tag -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmaak1cACgkQxWXV+ddt WDvtDw/9GLN/pzAhH+rg0v51b4Ofxqa5QXuz7GIHb+UOqOdiZIjun+UmBst8794t oZ0rhO1sjXVNwci5TG2oWUHcoAkjoPrVgJ/X5nzBHOzHJCQlV1kNzgsDyY+HFL9F FBkJ6u1pCXUIxry6zp55NLWLMcjtFdMphjLXENh+GzUx5aAXl7VamK41B3+pHGfW gDCNJ41vaaHqTGN8+9AADpZ2Oo4CUCtr8eNxGXZ0O5/0s5CdRJrjphtBTcGwbGch SYVceySM4ElLmFi+hKFxx68MkWgnuTnBhZhbOM4V2fPFv0hLzXyQ7OrAk+MY8O2m AHLx2jHKfCdONFeTUdt1cY5/wM6Afhy+N/iKtv3t2+Wi70C/LEHQRY1Op3rfx1rn vOIbR9IXEHVi6ncO/E9c4LacSqLd/KSOaZn2Z/6i5wN+NY86CrCMuPwr6Pv0LL6x aSHka47SFFTQLvHUdwmzexJ2YuosFdI0BhpiEu8ylAZTJ17yDJatk8wM+FB6Rfh1 vdPMRi93nVfrCwkU63Y2gqtJ3ncb3mbk/0uUdtMMflZJgjL0qkxTmcu3pSEdzIYR gHFLVlns4cljL5PB9yMH/JjYjYn+Y6bCVvVyuhQZ3FAanUVOSFin/YWfo1bIi1et ENNP+lhUKYvKLcz3QcnQpX3a6PkFPrmFi5wniAvxymrmVKJ3g3E= =ROO7 -----END PGP SIGNATURE----- Merge tag 'for-6.11-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fix from David Sterba: "A fix for build breakage on 32bit platforms" * tag 'for-6.11-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: change BTRFS_MOUNT_* flags to 64bit type	2024-07-19 14:34:52 -07:00
Linus Torvalds	f4f92db439	virtio: features, fixes, cleanups Several new features here: - Virtio find vqs API has been reworked (required to fix the scalability issue we have with adminq, which I hope to merge later in the cycle) - vDPA driver for Marvell OCTEON - virtio fs performance improvement - mlx5 migration speedups Fixes, cleanups all over the place. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> -----BEGIN PGP SIGNATURE----- iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmaXjQQPHG1zdEByZWRo YXQuY29tAAoJECgfDbjSjVRpnIsH/jVNqAQbe/vaBQdNMdnsA+P9A9unLbYRxYCQ tN73mQRIXKtnZHBRAEbMGq52HPYg8HlN2HJSgyNo6I6t8VD+PiOco7m+3GpmqEcW aXPOPl0BAbVoDgyutxRuuodP8Z61lBx0mG6iOxpzTXOPGlpQqtPCFHO8YnodqnPf tMix/5uAqgZKV2siCbw5DtzwEc0gDHU8qsD0/nyoS5nBDF9yh/ardr5P/qiyFDQH atCNYTOhIFU83pLAaw0fpCGbkt7gxf+5RpWVx3wkYww+/MwvYhsveRvQyaGbBz3n WDtET3SOtVTta98OAGIKCq/2z8f6mYXBP7vXapBgnJG3vwS/poQ= =LYua -----END PGP SIGNATURE----- Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost Pull virtio updates from Michael Tsirkin: "Several new features here: - Virtio find vqs API has been reworked (required to fix the scalability issue we have with adminq, which I hope to merge later in the cycle) - vDPA driver for Marvell OCTEON - virtio fs performance improvement - mlx5 migration speedups Fixes, cleanups all over the place" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (56 commits) virtio: rename virtio_find_vqs_info() to virtio_find_vqs() virtio: remove unused virtio_find_vqs() and virtio_find_vqs_ctx() helpers virtio: convert the rest virtio_find_vqs() users to virtio_find_vqs_info() virtio_balloon: convert to use virtio_find_vqs_info() virtiofs: convert to use virtio_find_vqs_info() scsi: virtio_scsi: convert to use virtio_find_vqs_info() virtio_net: convert to use virtio_find_vqs_info() virtio_crypto: convert to use virtio_find_vqs_info() virtio_console: convert to use virtio_find_vqs_info() virtio_blk: convert to use virtio_find_vqs_info() virtio: rename find_vqs_info() op to find_vqs() virtio: remove the original find_vqs() op virtio: call virtio_find_vqs_info() from virtio_find_single_vq() directly virtio: convert find_vqs() op implementations to find_vqs_info() virtio_pci: convert vp_*find_vqs() ops to find_vqs_info() virtio: introduce virtio_queue_info struct and find_vqs_info() config op virtio: make virtio_find_single_vq() call virtio_find_vqs() virtio: make virtio_find_vqs() call virtio_find_vqs_ctx() caif_virtio: use virtio_find_single_vq() for single virtqueue finding vdpa/mlx5: Don't enable non-active VQs in .set_vq_ready() ...	2024-07-19 11:57:55 -07:00
Jason A. Donenfeld	9651fcedf7	mm: add MAP_DROPPABLE for designating always lazily freeable mappings The vDSO getrandom() implementation works with a buffer allocated with a new system call that has certain requirements: - It shouldn't be written to core dumps. * Easy: VM_DONTDUMP. - It should be zeroed on fork. * Easy: VM_WIPEONFORK. - It shouldn't be written to swap. * Uh-oh: mlock is rlimited. * Uh-oh: mlock isn't inherited by forks. - It shouldn't reserve actual memory, but it also shouldn't crash when page faulting in memory if none is available * Uh-oh: VM_NORESERVE means segfaults. It turns out that the vDSO getrandom() function has three really nice characteristics that we can exploit to solve this problem: 1) Due to being wiped during fork(), the vDSO code is already robust to having the contents of the pages it reads zeroed out midway through the function's execution. 2) In the absolute worst case of whatever contingency we're coding for, we have the option to fallback to the getrandom() syscall, and everything is fine. 3) The buffers the function uses are only ever useful for a maximum of 60 seconds -- a sort of cache, rather than a long term allocation. These characteristics mean that we can introduce VM_DROPPABLE, which has the following semantics: a) It never is written out to swap. b) Under memory pressure, mm can just drop the pages (so that they're zero when read back again). c) It is inherited by fork. d) It doesn't count against the mlock budget, since nothing is locked. e) If there's not enough memory to service a page fault, it's not fatal, and no signal is sent. This way, allocations used by vDSO getrandom() can use: VM_DROPPABLE \| VM_DONTDUMP \| VM_WIPEONFORK \| VM_NORESERVE And there will be no problem with OOMing, crashing on overcommitment, using memory when not in use, not wiping on fork(), coredumps, or writing out to swap. In order to let vDSO getrandom() use this, expose these via mmap(2) as MAP_DROPPABLE. Note that this involves removing the MADV_FREE special case from sort_folio(), which according to Yu Zhao is unnecessary and will simply result in an extra call to shrink_folio_list() in the worst case. The chunk removed reenables the swapbacked flag, which we don't want for VM_DROPPABLE, and we can't conditionalize it here because there isn't a vma reference available. Finally, the provided self test ensures that this is working as desired. Cc: linux-mm@kvack.org Acked-by: David Hildenbrand <david@redhat.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2024-07-19 20:22:12 +02:00
David Howells	519be98971	cifs: Add a tracepoint to track credits involved in R/W requests Add a tracepoint to track the credit changes and server in_flight value involved in the lifetime of a R/W request, logging it against the request/subreq debugging ID. This requires the debugging IDs to be recorded in the cifs_credits struct. The tracepoint can be enabled with: echo 1 >/sys/kernel/debug/tracing/events/cifs/smb3_rw_credits/enable Also add a three-state flag to struct cifs_credits to note if we're interested in determining when the in_flight contribution ends and, if so, to track whether we've decremented the contribution yet. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.com> cc: Jeff Layton <jlayton@kernel.org> cc: linux-cifs@vger.kernel.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>	2024-07-19 11:08:57 -05:00
David Howells	61ea6b3a31	cifs: Fix setting of zero_point after DIO write At the moment, at the end of a DIO write, cifs calls netfs_resize_file() to adjust the size of the file if it needs it. This will reduce the zero_point (the point above which we assume a read will just return zeros) if it's more than the new i_size, but won't increase it. With DIO writes, however, we definitely want to increase it as we have clobbered the local pagecache and then written some data that's not available locally. Fix cifs to make the zero_point above the end of a DIO or unbuffered write. This fixes corruption seen occasionally with the generic/708 xfs-test. In that case, the read-back of some of the written data is being short-circuited and replaced with zeroes. Fixes: 3ee1a1fc3981 ("cifs: Cut over to using netfslib") Cc: stable@vger.kernel.org Reported-by: Steve French <sfrench@samba.org> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.com> cc: Jeff Layton <jlayton@kernel.org> cc: linux-cifs@vger.kernel.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>	2024-07-19 11:08:57 -05:00
David Howells	d2c5eb57b6	cifs: Fix missing error code set In cifs_strict_readv(), the default rc (-EACCES) is accidentally cleared by a successful return from netfs_start_io_direct(), such that if cifs_find_lock_conflict() fails, we don't return an error. Fix this by resetting the default error code. Fixes: 14b1cd25346b ("cifs: Fix locking in cifs_strict_readv()") Cc: stable@vger.kernel.org Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.com> cc: Jeff Layton <jlayton@kernel.org> cc: linux-cifs@vger.kernel.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>	2024-07-19 11:08:57 -05:00
David Howells	de40579b90	cifs: Fix server re-repick on subrequest retry When a subrequest is marked for needing retry, netfs will call cifs_prepare_write() which will make cifs repick the server for the op before renegotiating credits; it then calls cifs_issue_write() which invokes smb2_async_writev() - which re-repicks the server. If a different server is then selected, this causes the increment of server->in_flight to happen against one record and the decrement to happen against another, leading to misaccounting. Fix this by just removing the repick code in smb2_async_writev(). As this is only called from netfslib-driven code, cifs_prepare_write() should always have been called first, and so server should never be NULL and the preparatory step is repeated in the event that we do a retry. The problem manifests as a warning looking something like: WARNING: CPU: 4 PID: 72896 at fs/smb/client/smb2ops.c:97 smb2_add_credits+0x3f0/0x9e0 [cifs] ... RIP: 0010:smb2_add_credits+0x3f0/0x9e0 [cifs] ... smb2_writev_callback+0x334/0x560 [cifs] cifs_demultiplex_thread+0x77a/0x11b0 [cifs] kthread+0x187/0x1d0 ret_from_fork+0x34/0x60 ret_from_fork_asm+0x1a/0x30 Which may be triggered by a number of different xfstests running against an Azure server in multichannel mode. generic/249 seems the most repeatable, but generic/215, generic/249 and generic/308 may also show it. Fixes: 3ee1a1fc3981 ("cifs: Cut over to using netfslib") Cc: stable@vger.kernel.org Reported-by: Steve French <smfrench@gmail.com> Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.com> Acked-by: Tom Talpey <tom@talpey.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: Jeff Layton <jlayton@kernel.org> cc: Aurelien Aptel <aaptel@suse.com> cc: linux-cifs@vger.kernel.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>	2024-07-19 11:08:57 -05:00
Steve French	ae4ccca471	cifs: fix noisy message on copy_file_range There are common cases where copy_file_range can noisily log "source and target of copy not on same server" e.g. the mv command across mounts to two different server's shares. Change this to informational rather than logging as an error. A followon patch will add dynamic trace points e.g. for cifs_file_copychunk_range Cc: stable@vger.kernel.org Reviewed-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2024-07-19 11:08:38 -05:00
Qu Wenruo	c3ece6b7ff	btrfs: change BTRFS_MOUNT_* flags to 64bit type Currently the BTRFS_MOUNT_* flags are already beyond 32 bits, this is going to cause compilation errors for some 32 bit systems, as their unsigned long is only 32 bits long, thus flag BTRFS_MOUNT_IGNORESUPERFLAGS overflows and can lead to errors. Fix the problem by: - Migrate all existing BTRFS_MOUNT_* flags to unsigned long long - Migrate all mount option related variables to unsigned long long * btrfs_fs_info::mount_opt * btrfs_fs_context::mount_opt * mount_opt parameter of btrfs_check_options() * old_opts parameter of btrfs_remount_begin() * old_opts parameter of btrfs_remount_cleanup() * mount_opt parameter of btrfs_check_mountopts_zoned() * mount_opt and opt parameters of check_ro_option() Fixes: 32e6216512b4 ("btrfs: introduce new "rescue=ignoresuperflags" mount option") Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-07-19 17:20:23 +02:00
Kent Overstreet	2fa88b1919	bcachefs: kill btree_trans_too_many_iters() in bch2_bucket_alloc_freelist() When we're called via trans commit -> btree split -> allocator We may have already arbitrarily many btree_paths, for the transaction commit we're trying to do; when this happens, the btree_trans_too_many_iters() call causes us to livelock. Since the allocator calls btree_iter_dontneed to release paths as it iterates, this shouldn't cause any problems. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-18 21:06:03 -04:00
Linus Torvalds	720261cfc7	bcachefs changes for 6.11-rc1 (version 2) Additional fixes on top of the original 6.11 pull request: - undefined behaviour fixes, originally noted as breaking userspace LTO builds - fix a spurious warning in fsck_err, reported by Marcin - fix an integer overflow on trans->nr_updates, also reported by Marcin; this broke during deletion of highly fragmented indirect extents - Add comments for lockdep functions ====== - Metadata version 1.8: Stripe sectors accounting, BCH_DATA_unstriped This splits out the accounting of dirty sectors and stripe sectors in alloc keys; this lets us see stripe buckets that still have unstriped data in them. This is needed for ensuring that erasure coding is working correctly, as well as completing stripe creation after a crash. - Metadata version 1.9: Disk accounting rewrite The previous disk accounting scheme relied heavily on percpu counters that were also sharded by outstanding journal buffer; it was fast but not extensible or scalable, and meant that all accounting counters were recorded in every journal entry. The new disk accounting scheme stores accounting as normal btree keys; updates are deltas until they are flushed by the btree write buffer. This means we have no practical limit on the number of counters, and a new tagged union format that's easy to extend. We now have counters for compression type/ratio, per-snapshot-id usage, per-btree-id usage, and pending rebalance work. - Self healing on read IO/checksum error data is now automatically rewritten if we get a read error and then a successful retry - Mount API conversion (thanks to Thomas Bertschinger) - Better lockdep coverage Previously, btree node locks were tracked individually by lockdep, like any other lock. But we may take _many_ btree node locks simultaneously, we easily blow through the limit of 48 locks that lockdep can track, leading to lockdep turning itself off. Tracking each btree node lock individually isn't really necessary since we have our own cycle detector for deadlock avoidance and centralized tracking of btree node locks, so we now have a single lockdep_map in btree_trans for "any btree nodes are locked". - some more small incremental work towards online check_allocations - lots more debugging improvements, fixes -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEKnAFLkS8Qha+jvQrE6szbY3KbnYFAmaZmVYACgkQE6szbY3K bnayLBAAr/RB75neEVzNqzE/qLoz9HBfPs7NrGNZg3bLzie9BvcEKGf3VUs2mm83 6qN/bCJRBhd2vdMcFvIrYYqvD+F2qFFFrBjTzY20toReCwO6q5o8Ihfzv+uSu1qn Lg/6AbwwDwHPgSLcFM6yVfNsNBpOZ4tru7xble5/89RGp5uhbU38tsFwkUcngN/t 4GWjtUYC+rFZaJSbt+wtGtM++nSURhMu1rxbR3MnVLT3JNW0wiG+FnRymCRQOYwx eslI6wP3JArTKMvACJqXTDivmjUPNdvsJ26LW6b5KBLi411OgV219ek0mJlkc5gl lOOZ5LPmPqn0BxsIdqOd+/OGOpvYkfWEyDj6/46K8KKFt+i++vCTzqIeL06HGQFS oCZajoHseLTlWPZTnoi3/KPWkKSJyaROrvFPSHT5/9sJfeeFt88hNSMP9g4+S4GI QSXK70GzEVjxr7YzUZwZUmRGWHt7YcS/qkTKJZRkLwUbr2BnrKEJhOa0t/i3RopN glFP/hP3dObTcWUF4xH90htfD2E/wHbP7nPwNCL6367n18Uj4TJD09vtM8xHsXSR YXECCjfskVDHAQJw/aV5jF1NpaeSW6g/bILi8tQhvRpfzLLvsaVGxA1xp5v+VZEQ w3WBepPofEE1EaVfKNeHCiE7STAnSVbWYWTatmjPdqVCyT7C9S0= =tnhh -----END PGP SIGNATURE----- Merge tag 'bcachefs-2024-07-18.2' of https://evilpiepirate.org/git/bcachefs Pull bcachefs updates from Kent Overstreet: - Metadata version 1.8: Stripe sectors accounting, BCH_DATA_unstriped This splits out the accounting of dirty sectors and stripe sectors in alloc keys; this lets us see stripe buckets that still have unstriped data in them. This is needed for ensuring that erasure coding is working correctly, as well as completing stripe creation after a crash. - Metadata version 1.9: Disk accounting rewrite The previous disk accounting scheme relied heavily on percpu counters that were also sharded by outstanding journal buffer; it was fast but not extensible or scalable, and meant that all accounting counters were recorded in every journal entry. The new disk accounting scheme stores accounting as normal btree keys; updates are deltas until they are flushed by the btree write buffer. This means we have no practical limit on the number of counters, and a new tagged union format that's easy to extend. We now have counters for compression type/ratio, per-snapshot-id usage, per-btree-id usage, and pending rebalance work. - Self healing on read IO/checksum error Data is now automatically rewritten if we get a read error and then a successful retry - Mount API conversion (thanks to Thomas Bertschinger) - Better lockdep coverage Previously, btree node locks were tracked individually by lockdep, like any other lock. But we may take _many_ btree node locks simultaneously, we easily blow through the limit of 48 locks that lockdep can track, leading to lockdep turning itself off. Tracking each btree node lock individually isn't really necessary since we have our own cycle detector for deadlock avoidance and centralized tracking of btree node locks, so we now have a single lockdep_map in btree_trans for "any btree nodes are locked". - Some more small incremental work towards online check_allocations - Lots more debugging improvements - Fixes, including: - undefined behaviour fixes, originally noted as breaking userspace LTO builds - fix a spurious warning in fsck_err, reported by Marcin - fix an integer overflow on trans->nr_updates, also reported by Marcin; this broke during deletion of highly fragmented indirect extents * tag 'bcachefs-2024-07-18.2' of https://evilpiepirate.org/git/bcachefs: (120 commits) lockdep: Add comments for lockdep_set_no{validate,track}_class() bcachefs: Fix integer overflow on trans->nr_updates bcachefs: silence silly kdoc warning bcachefs: Fix fsck warning about btree_trans not passed to fsck error bcachefs: Add an error message for insufficient rw journal devs bcachefs: varint: Avoid left-shift of a negative value bcachefs: darray: Don't pass NULL to memcpy() bcachefs: Kill bch2_assert_btree_nodes_not_locked() bcachefs: Rename BCH_WRITE_DONE -> BCH_WRITE_SUBMITTED bcachefs: __bch2_read(): call trans_begin() on every loop iter bcachefs: show none if label is not set bcachefs: drop packed, aligned from bkey_inode_buf bcachefs: btree node scan: fall back to comparing by journal seq bcachefs: Add lockdep support for btree node locks lockdep: lockdep_set_notrack_class() bcachefs: Improve copygc_wait_to_text() bcachefs: Convert clock code to u64s bcachefs: Improve startup message bcachefs: Self healing on read IO error bcachefs: Make read_only a mount option again, but hidden ...	2024-07-18 17:27:43 -07:00
Linus Torvalds	4f40c636b2	NFS Client Updates for Linux 6.11 New Features: * Add support for large folios * Implement rpcrdma generic device removal notification * Add client support for attribute delegations * Use a LAYOUTRETURN during reboot recovery to report layoutstats and errors * Improve throughput for random buffered writes * Add NVMe support to pnfs/blocklayout Bugfixes: * Fix rpcrdma_reqs_reset() * Avoid soft lockups when using UDP * Fix an nfs/blocklayout premature PR key unregestration * Another fix for EXCHGID4_FLAG_USE_PNFS_DS for DS server * Do not extend writes to the entire folio * Pass explicit offset and count values to tracepoints * Fix a race to wake up sleeping SUNRPC sync tasks * Fix gss_status tracepoint output Cleanups: * Add missing MODULE_DESCRIPTION() macros * Add blocklayout / SCSI layout tracepoints * Remove asm-generic headers from xprtrdma verbs.c * Remove unused 'struct mnt_fhstatus' * Other delegation related cleanups * Other folio related cleanups * Other pNFS related cleanups * Other xprtrdma cleanups -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEnZ5MQTpR7cLU7KEp18tUv7ClQOsFAmaZgr0ACgkQ18tUv7Cl QOv8FxAAnUyYG7Kdbv+5Ko/SFv0imxCb5DQh2XC/hSHNrlKBlDnqe2PANXR9XocL mS0Wry5tZf/T+o+QoKv0HQUdWFlnqKzwclggrekf/lkioU1feWsLe2RzDl1iUh0V 6fwcCyWXW1mYX2CtCaDe+/ZFcoZOMD+bItNHt/RdDScSnS9Jd8GSyocsVKsqaBx6 3wub0FJ4UBgYNoX2T3YyK2JwvO9GLaKIQRJV74rjgPJKjcjhptbcb5MKBmOZrF95 UCcpl4CwvD9RTsSEp0B98UbAFFpk8Nw1tmHF3GmyG/nsrJomDuLKFvbsiq23eHUf XeULZIbjMEzU56vjoTglZA4s7JYx17D0vzdPGUqU4mLN3LPm5LtGLBg2uQoPw/xW 50euLU+ol36mfnQlBsuM/tAXgtoAcT63aNeNRNp8aOL47xA+PC6kWTBK9OaR5+x6 w+d22Dpy+riMk1TRaAVt0ANcENKELsWRFvxkuWCpQhVoQ1h8LigQJzeggEEK7Sa6 5u9H6wCTee2wz746uwA43koj1utuyrLq/5S+qEtCY1pbP3U0A+Gh0Xh00OXiYuzL TgRdksmiAL8cA51WjSrq6HhGLOUJAYLfbdKaVhW+fULxUVwzWhFFaFbbdiq/e4OR 0pfqls8UZWICE51GeTfalEidpKZgV/LxU3QOuVoalWBULyj/TeI= =avTW -----END PGP SIGNATURE----- Merge tag 'nfs-for-6.11-1' of git://git.linux-nfs.org/projects/anna/linux-nfs Pull NFS client updates from Anna Schumaker: "New Features: - Add support for large folios - Implement rpcrdma generic device removal notification - Add client support for attribute delegations - Use a LAYOUTRETURN during reboot recovery to report layoutstats and errors - Improve throughput for random buffered writes - Add NVMe support to pnfs/blocklayout Bugfixes: - Fix rpcrdma_reqs_reset() - Avoid soft lockups when using UDP - Fix an nfs/blocklayout premature PR key unregestration - Another fix for EXCHGID4_FLAG_USE_PNFS_DS for DS server - Do not extend writes to the entire folio - Pass explicit offset and count values to tracepoints - Fix a race to wake up sleeping SUNRPC sync tasks - Fix gss_status tracepoint output Cleanups: - Add missing MODULE_DESCRIPTION() macros - Add blocklayout / SCSI layout tracepoints - Remove asm-generic headers from xprtrdma verbs.c - Remove unused 'struct mnt_fhstatus' - Other delegation related cleanups - Other folio related cleanups - Other pNFS related cleanups - Other xprtrdma cleanups" * tag 'nfs-for-6.11-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (63 commits) SUNRPC: Fixup gss_status tracepoint error output SUNRPC: Fix a race to wake a sync task nfs: split nfs_read_folio nfs: pass explicit offset/count to trace events nfs: do not extend writes to the entire folio nfs/blocklayout: add support for NVMe nfs: remove nfs_page_length nfs: remove the unused max_deviceinfo_size field from struct pnfs_layoutdriver_type nfs: don't reuse partially completed requests in nfs_lock_and_join_requests nfs: move nfs_wait_on_request to write.c nfs: fold nfs_page_group_lock_subrequests into nfs_lock_and_join_requests nfs: fold nfs_folio_find_and_lock_request into nfs_lock_and_join_requests nfs: simplify nfs_folio_find_and_lock_request nfs: remove nfs_folio_private_request nfs: remove dead code for the old swap over NFS implementation NFSv4.1 another fix for EXCHGID4_FLAG_USE_PNFS_DS for DS server nfs: Block on write congestion nfs: Properly initialize server->writeback nfs: Drop pointless check from nfs_commit_release_pages() nfs/blocklayout: SCSI layout trace points for reservation key reg/unreg ...	2024-07-18 17:17:30 -07:00
Linus Torvalds	51ed42a8a1	Many cleanups and bug fixes in ext4, especially for the fast commit feature. Also some performance improvements; in particular, improving IOPS and throughput on fast devices running Async Direct I/O by up to 20% by optimizing jbd2_transaction_committed(). -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAmaYiqsACgkQ8vlZVpUN gaOWpQf/d6Y9WGyjeC1jOc+vIBxLgL+X0kbzYkkjGTSIZ7mZJS9X4NMMEtqayJ4f 1zGobcGENc05l4LVxf3uMbDj1aGlHeI9X4GLGaP5s5NcaAl4HKjQ3aFs3MuiJHPj Ol2CebXJx+NKt1lkD8PSPGgaTb5zg+SeZifI+OZ1RpkcKmGnkSNa5NkUNAaBh6dl 5LLXTc2p9NcCwAwDAQSiAJCV35bAZpcp6fwLLaPQ6Eok9HxGcJuYXW2Fict4rbtV mXeogXVIo2bkMcfh6tDchDBrFvORYIA7uBVmaG1LgAMrtEnYxnxnEntD0h6j/bzF Fl4jjQfd8o2uYto/4eo+iY6Z0haxyQ== =rcOo -----END PGP SIGNATURE----- Merge tag 'ext4_for_linus-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates from Ted Ts'o: "Many cleanups and bug fixes in ext4, especially for the fast commit feature. Also some performance improvements; in particular, improving IOPS and throughput on fast devices running Async Direct I/O by up to 20% by optimizing jbd2_transaction_committed()" * tag 'ext4_for_linus-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (40 commits) ext4: make sure the first directory block is not a hole ext4: check dot and dotdot of dx_root before making dir indexed ext4: sanity check for NULL pointer after ext4_force_shutdown jbd2: increase maximum transaction size jbd2: drop pointless shrinker batch initialization jbd2: avoid infinite transaction commit loop jbd2: precompute number of transaction descriptor blocks jbd2: make jbd2_journal_get_max_txn_bufs() internal jbd2: avoid mount failed when commit block is partial submitted ext4: avoid writing unitialized memory to disk in EA inodes ext4: don't track ranges in fast_commit if inode has inlined data ext4: fix possible tid_t sequence overflows ext4: use ext4_update_inode_fsync_trans() helper in inode creation ext4: add missing MODULE_DESCRIPTION() jbd2: add missing MODULE_DESCRIPTION() ext4: use memtostr_pad() for s_volume_name jbd2: speed up jbd2_transaction_committed() ext4: make ext4_da_map_blocks() buffer_head unaware ext4: make ext4_insert_delayed_block() insert multi-blocks ext4: factor out a helper to check the cluster allocation state ...	2024-07-18 17:03:42 -07:00
Linus Torvalds	dddebdece6	vfs-6.11-rc1.fixes -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZpjJRwAKCRCRxhvAZXjc os63AQCCoMzENXY4d5N80H+L6ro1Ccj/nwMACwuvwcDN9pCj7gD/X1T/pidHdagh kmzfltsva4TA77Zg6kfuzvd1tA3PnwE= =rXm4 -----END PGP SIGNATURE----- Merge tag 'vfs-6.11-rc1.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs fixes from Christian Brauner: - Fix a missing rcu_read_unlock() in nsfs by switching to a cleanup guard - Add missing module descriptor for adfs * tag 'vfs-6.11-rc1.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: nsfs: use cleanup guard fs/adfs: add MODULE_DESCRIPTION	2024-07-18 16:59:02 -07:00
Tavian Barnes	73f88592dd	bcachefs: mean_and_variance: Avoid too-large shift amounts Shifting a value by the width of its type or more is undefined. Signed-off-by: Tavian Barnes <tavianator@tavianator.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-18 18:33:30 -04:00
Kent Overstreet	6f719cbe0c	bcachefs: Fix integer overflow on trans->nr_updates We can't have more updates than paths, so btree_path_idx_t is the correct type to use. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-18 18:33:30 -04:00
Kent Overstreet	f05a0b9c73	bcachefs: silence silly kdoc warning Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-18 18:33:30 -04:00
Kent Overstreet	2c4c17fefc	bcachefs: Fix fsck warning about btree_trans not passed to fsck error If a btree_trans is in use it's supposed to be passed to fsck_err so that it can be unlocked if we're waiting on userspace input; but the btree IO paths do call fsck errors where a btree_trans exists on the stack but it's not passed through. But it's ok, because it's unlocked while doing IO. Fixes: a850bde6498b ("bcachefs: fsck_err() may now take a btree_trans") Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-07-18 18:33:30 -04:00

1 2 3 4 5 ...

92896 Commits