linux

iv/linux

Author	SHA1	Message	Date
Günther Noack	d1654fd98b	fs/ioctl: Add a comment to keep the logic in sync with LSM policies Landlock's IOCTL support needs to partially replicate the list of IOCTLs from do_vfs_ioctl(). The list of commands implemented in do_vfs_ioctl() should be kept in sync with Landlock's IOCTL policies. Suggested-by: Paul Moore <paul@paul-moore.com> Suggested-by: Mickaël Salaün <mic@digikod.net> Signed-off-by: Günther Noack <gnoack@google.com> Link: https://lore.kernel.org/r/20240419161122.2023765-12-gnoack@google.com Signed-off-by: Mickaël Salaün <mic@digikod.net>	2024-05-13 06:58:35 +02:00
Namjae Jeon	c91ecba9e4	ksmbd: avoid to send duplicate oplock break notifications This patch fixes generic/011 when oplocks is enable. Avoid to send duplicate oplock break notifications like smb2 leases case. Fixes: `97c2ec6466` ("ksmbd: avoid to send duplicate lease break notifications") Cc: stable@vger.kernel.org Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2024-05-12 16:53:16 -05:00
Wang Yong	af9a8730dd	jffs2: Fix potential illegal address access in jffs2_free_inode During the stress testing of the jffs2 file system,the following abnormal printouts were found: [ 2430.649000] Unable to handle kernel paging request at virtual address 0069696969696948 [ 2430.649622] Mem abort info: [ 2430.649829] ESR = 0x96000004 [ 2430.650115] EC = 0x25: DABT (current EL), IL = 32 bits [ 2430.650564] SET = 0, FnV = 0 [ 2430.650795] EA = 0, S1PTW = 0 [ 2430.651032] FSC = 0x04: level 0 translation fault [ 2430.651446] Data abort info: [ 2430.651683] ISV = 0, ISS = 0x00000004 [ 2430.652001] CM = 0, WnR = 0 [ 2430.652558] [0069696969696948] address between user and kernel address ranges [ 2430.653265] Internal error: Oops: 96000004 [#1] PREEMPT SMP [ 2430.654512] CPU: 2 PID: 20919 Comm: cat Not tainted 5.15.25-g512f31242bf6 #33 [ 2430.655008] Hardware name: linux,dummy-virt (DT) [ 2430.655517] pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 2430.656142] pc : kfree+0x78/0x348 [ 2430.656630] lr : jffs2_free_inode+0x24/0x48 [ 2430.657051] sp : ffff800009eebd10 [ 2430.657355] x29: ffff800009eebd10 x28: 0000000000000001 x27: 0000000000000000 [ 2430.658327] x26: ffff000038f09d80 x25: 0080000000000000 x24: ffff800009d38000 [ 2430.658919] x23: 5a5a5a5a5a5a5a5a x22: ffff000038f09d80 x21: ffff8000084f0d14 [ 2430.659434] x20: ffff0000bf9a6ac0 x19: 0169696969696940 x18: 0000000000000000 [ 2430.659969] x17: ffff8000b6506000 x16: ffff800009eec000 x15: 0000000000004000 [ 2430.660637] x14: 0000000000000000 x13: 00000001000820a1 x12: 00000000000d1b19 [ 2430.661345] x11: 0004000800000000 x10: 0000000000000001 x9 : ffff8000084f0d14 [ 2430.662025] x8 : ffff0000bf9a6b40 x7 : ffff0000bf9a6b48 x6 : 0000000003470302 [ 2430.662695] x5 : ffff00002e41dcc0 x4 : ffff0000bf9aa3b0 x3 : 0000000003470342 [ 2430.663486] x2 : 0000000000000000 x1 : ffff8000084f0d14 x0 : fffffc0000000000 [ 2430.664217] Call trace: [ 2430.664528] kfree+0x78/0x348 [ 2430.664855] jffs2_free_inode+0x24/0x48 [ 2430.665233] i_callback+0x24/0x50 [ 2430.665528] rcu_do_batch+0x1ac/0x448 [ 2430.665892] rcu_core+0x28c/0x3c8 [ 2430.666151] rcu_core_si+0x18/0x28 [ 2430.666473] __do_softirq+0x138/0x3cc [ 2430.666781] irq_exit+0xf0/0x110 [ 2430.667065] handle_domain_irq+0x6c/0x98 [ 2430.667447] gic_handle_irq+0xac/0xe8 [ 2430.667739] call_on_irq_stack+0x28/0x54 The parameter passed to kfree was 5a5a5a5a, which corresponds to the target field of the jffs_inode_info structure. It was found that all variables in the jffs_inode_info structure were 5a5a5a5a, except for the first member sem. It is suspected that these variables are not initialized because they were set to 5a5a5a5a during memory testing, which is meant to detect uninitialized memory.The sem variable is initialized in the function jffs2_i_init_once, while other members are initialized in the function jffs2_init_inode_info. The function jffs2_init_inode_info is called after iget_locked, but in the iget_locked function, the destroy_inode process is triggered, which releases the inode and consequently, the target member of the inode is not initialized.In concurrent high pressure scenarios, iget_locked may enter the destroy_inode branch as described in the code. Since the destroy_inode functionality of jffs2 only releases the target, the fix method is to set target to NULL in jffs2_i_init_once. Signed-off-by: Wang Yong <wang.yong12@zte.com.cn> Reviewed-by: Lu Zhongjun <lu.zhongjun@zte.com.cn> Reviewed-by: Yang Tao <yang.tao172@zte.com.cn> Cc: Xu Xin <xu.xin16@zte.com.cn> Cc: Yang Yang <yang.yang29@zte.com.cn> Signed-off-by: Richard Weinberger <richard@nod.at>	2024-05-12 22:57:04 +02:00
Kunwu Chan	7096fae56f	jffs2: Simplify the allocation of slab caches Use the new KMEM_CACHE() macro instead of direct kmem_cache_create to simplify the creation of SLAB caches. And change cache name from 'jffs2_tmp_dnode' to 'jffs2_tmp_dnode_info'. Signed-off-by: Kunwu Chan <chentao@kylinos.cn> Reviewed-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at>	2024-05-12 22:17:41 +02:00
Randy Dunlap	2e0a808224	jffs2: nodemgmt: fix kernel-doc comments Update the end of one sentence where a comment was truncated. (dwmw2) Fix a bunch of kernel-doc warnings: nodemgmt.c:72: warning: Function parameter or member 'sumsize' not described in 'jffs2_do_reserve_space' nodemgmt.c:72: warning: expecting prototype for jffs2_reserve_space(). Prototype was for jffs2_do_reserve_space() instead nodemgmt.c:76: warning: Function parameter or member 'sumsize' not described in 'jffs2_reserve_space' nodemgmt.c:76: warning: No description found for return value of 'jffs2_reserve_space' nodemgmt.c:503: warning: Function parameter or member 'ofs' not described in 'jffs2_add_physical_node_ref' nodemgmt.c:503: warning: Function parameter or member 'ic' not described in 'jffs2_add_physical_node_ref' nodemgmt.c:503: warning: Excess function parameter 'new' description in 'jffs2_add_physical_node_ref' nodemgmt.c:503: warning: No description found for return value of 'jffs2_add_physical_node_ref' Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Richard Weinberger <richard@nod.at> Cc: linux-mtd@lists.infradead.org Reviewed-by: Zhihao Cheng <chengzhihao1@huawei.com> Reviewed-by: Jeff Johnson <quic_jjohnson@quicinc.com> Signed-off-by: Richard Weinberger <richard@nod.at>	2024-05-12 22:16:04 +02:00
Christian Heusel	0162a70d8e	jffs2: print symbolic error name instead of error code Utilize the %pe print specifier to get the symbolic error name as a string (i.e "-ENOMEM") in the log message instead of the error code to increase its readablility. This change was suggested in https://lore.kernel.org/all/92972476-0b1f-4d0a-9951-af3fc8bc6e65@suswa.mountain/ Signed-off-by: Christian Heusel <christian@heusel.eu> Reviewed-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at>	2024-05-12 22:13:00 +02:00
Rik van Riel	5cbcb62ddd	fs/proc: fix softlockup in __read_vmcore While taking a kernel core dump with makedumpfile on a larger system, softlockup messages often appear. While softlockup warnings can be harmless, they can also interfere with things like RCU freeing memory, which can be problematic when the kdump kexec image is configured with as little memory as possible. Avoid the softlockup, and give things like work items and RCU a chance to do their thing during __read_vmcore by adding a cond_resched. Link: https://lkml.kernel.org/r/20240507091858.36ff767f@imladris.surriel.com Signed-off-by: Rik van Riel <riel@surriel.com> Acked-by: Baoquan He <bhe@redhat.com> Cc: Dave Young <dyoung@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2024-05-11 15:51:44 -07:00
Ryusuke Konishi	0a73eac1ed	nilfs2: convert BUG_ON() in nilfs_finish_roll_forward() to WARN_ON() The BUG_ON check performed on the return value of __getblk() in nilfs_finish_roll_forward() assumes that a buffer that has been successfully read once is retrieved with the same parameters and does not fail (__getblk() does not return an error due to memory allocation failure). Also, nilfs_finish_roll_forward() is called at most once during mount. Taking these into consideration, rewrite the check to use WARN_ON() to avoid using BUG_ON(). Link: https://lkml.kernel.org/r/20240508221429.7559-1-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2024-05-11 15:51:44 -07:00
Matthew Wilcox (Oracle)	a7ac59f4f2	nilfs2: remove calls to folio_set_error() and folio_clear_error() Nobody checks this flag on nilfs2 folios, stop setting and clearing it. That lets us simplify nilfs_end_folio_io() slightly. Link: https://lkml.kernel.org/r/20240420025029.2166544-17-willy@infradead.org Link: https://lkml.kernel.org/r/20240430050901.3239-1-konishi.ryusuke@gmail.com Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: kernel test robot <lkp@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2024-05-11 15:51:43 -07:00
Jens Axboe	fe6532b44a	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next into net-accept-more * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1557 commits) net: qede: use extack in qede_parse_actions() net: qede: propagate extack through qede_flow_spec_validate() net: qede: use faked extack in qede_flow_spec_to_rule() net: qede: use extack in qede_parse_flow_attr() net: qede: add extack in qede_add_tc_flower_fltr() net: qede: use extack in qede_flow_parse_udp_v4() net: qede: use extack in qede_flow_parse_udp_v6() net: qede: use extack in qede_flow_parse_tcp_v4() net: qede: use extack in qede_flow_parse_tcp_v6() net: qede: use extack in qede_flow_parse_v4_common() net: qede: use extack in qede_flow_parse_v6_common() net: qede: use extack in qede_set_v4_tuple_to_profile() net: qede: use extack in qede_set_v6_tuple_to_profile() net: qede: use extack in qede_flow_parse_ports() net: usb: smsc95xx: stop lying about skb->truesize net: dsa: microchip: Fix spellig mistake "configur" -> "configure" af_unix: Add dead flag to struct scm_fp_list. net: ethernet: adi: adin1110: Replace linux/gpio.h by proper one octeontx2-pf: Reuse Transmit queue/Send queue index of HTB class gve: Use ethtool_sprintf/puts() to fill stats strings ...	2024-05-11 08:25:55 -06:00
Chao Yu	a798ff17cd	f2fs: fix to add missing iput() in gc_data_segment() During gc_data_segment(), if inode state is abnormal, it missed to call iput(), fix it. Fixes: `b73e52824c` ("f2fs: reposition unlock_new_inode to prevent accessing invalid inode") Fixes: `9056d6489f` ("f2fs: fix to do sanity check on inode type during garbage collection") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2024-05-11 00:40:07 +00:00
Daeho Jeong	f2526c5cf1	f2fs: allow dirty sections with zero valid block for checkpoint disabled Following the semantic for dirty segments in checkpoint disabled mode, apply the same rule to dirty sections. Signed-off-by: Daeho Jeong <daehojeong@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2024-05-11 00:39:07 +00:00
Linus Torvalds	c22c3e0753	18 hotfixes, 7 of which are cc:stable. More fixups for this cycle's page_owner updates. And a few userfaultfd fixes. Otherwise, random singletons - see the individual changelogs for details. -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZj6AhAAKCRDdBJ7gKXxA jsvHAQCoSRI4qM0a6j5Fs2Q+B1in+kGWTe50q5Rd755VgolEsgD8CUASDgZ2Qv7g yDAlluXMv4uvA4RqkZvDiezsENzYQw0= =MApd -----END PGP SIGNATURE----- Merge tag 'mm-hotfixes-stable-2024-05-10-13-14' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM fixes from Andrew Morton: "18 hotfixes, 7 of which are cc:stable. More fixups for this cycle's page_owner updates. And a few userfaultfd fixes. Otherwise, random singletons - see the individual changelogs for details" * tag 'mm-hotfixes-stable-2024-05-10-13-14' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: mailmap: add entry for Barry Song selftests/mm: fix powerpc ARCH check mailmap: add entry for John Garry XArray: set the marks correctly when splitting an entry selftests/vDSO: fix runtime errors on LoongArch selftests/vDSO: fix building errors on LoongArch mm,page_owner: don't remove __GFP_NOLOCKDEP in add_stack_record_to_list fs/proc/task_mmu: fix uffd-wp confusion in pagemap_scan_pmd_entry() fs/proc/task_mmu: fix loss of young/dirty bits during pagemap scan mm/vmalloc: fix return value of vb_alloc if size is 0 mm: use memalloc_nofs_save() in page_cache_ra_order() kmsan: compiler_types: declare __no_sanitize_or_inline lib/test_xarray.c: fix error assumptions on check_xa_multi_store_adv_add() tools: fix userspace compilation with new test_xarray changes MAINTAINERS: update URL's for KEYS/KEYRINGS_INTEGRITY and TPM DEVICE DRIVER mm: page_owner: fix wrong information in dump_page_owner maple_tree: fix mas_empty_area_rev() null pointer dereference mm/userfaultfd: reset ptes when close() for wr-protected ones	2024-05-10 14:16:03 -07:00
Peter-Jan Gootzen	529395d2ae	virtio-fs: add multi-queue support This commit creates a multi-queue mapping at device bring-up. The driver first attempts to use the existing MSI-X interrupt affinities (previously disabled), and if not present, will distribute the request queues evenly over the CPUs. If the latter fails as well, all CPUs are mapped to request queue zero. When a request is handed from FUSE to the virtio-fs device driver, the driver will use the current CPU to index into the multi-queue mapping and determine the optimal request queue to use. We measured the performance of this patch with the fio benchmarking tool, increasing the number of queues results in a significant speedup for both read and write operations, demonstrating the effectiveness of multi-queue support. Host: - Dell PowerEdge R760 - CPU: Intel(R) Xeon(R) Gold 6438M, 128 cores - VM: KVM with 32 cores Virtio-fs device: - BlueField-3 DPU - CPU: ARM Cortex-A78AE, 16 cores - One thread per queue, each busy polling on one request queue - Each queue is 1024 descriptors deep Workload: - fio, sequential read or write, ioengine=libaio, numjobs=32, 4GiB file per job, iodepth=8, bs=256KiB, runtime=30s Performance Results: +===========================+==========+===========+ \| Number of queues \| Fio read \| Fio write \| +===========================+==========+===========+ \| 1 request queue (GiB/s) \| 6.1 \| 4.6 \| +---------------------------+----------+-----------+ \| 8 request queues (GiB/s) \| 25.8 \| 10.3 \| +---------------------------+----------+-----------+ \| 16 request queues (GiB/s) \| 30.9 \| 19.5 \| +---------------------------+----------+-----------+ \| 32 request queue (GiB/s) \| 33.2 \| 22.6 \| +---------------------------+----------+-----------+ \| Speedup \| 5.5x \| 5x \| +---------------=-----------+----------+-----------+ Signed-off-by: Peter-Jan Gootzen <pgootzen@nvidia.com> Signed-off-by: Yoray Zack <yorayz@nvidia.com> Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-05-10 13:38:14 +02:00
Peter-Jan Gootzen	103c2de111	virtio-fs: limit number of request queues Virtio-fs devices might allocate significant resources to virtio queues such as CPU cores that busy poll on the queue. The device indicates how many request queues it can support and the driver should initialize the number of queues that they want to utilize. In this patch we limit the number of initialized request queues to the number of CPUs, to limit the resource consumption on the device-side and to prepare for the upcoming multi-queue patch. Signed-off-by: Peter-Jan Gootzen <pgootzen@nvidia.com> Signed-off-by: Yoray Zack <yorayz@nvidia.com> Suggested-by: Max Gurtovoy <mgurtovoy@nvidia.com> Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-05-10 13:38:14 +02:00
Thorsten Blum	e9229c18da	ovl: remove duplicate included header Remove duplicate included header file linux/posix_acl.h Signed-off-by: Thorsten Blum <thorsten.blum@toblux.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-05-10 13:22:46 +02:00
Hou Tao	246014876d	fuse: clear FR_SENT when re-adding requests into pending list The following warning was reported by lee bruce: ------------[ cut here ]------------ WARNING: CPU: 0 PID: 8264 at fs/fuse/dev.c:300 fuse_request_end+0x685/0x7e0 fs/fuse/dev.c:300 Modules linked in: CPU: 0 PID: 8264 Comm: ab2 Not tainted 6.9.0-rc7 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) RIP: 0010:fuse_request_end+0x685/0x7e0 fs/fuse/dev.c:300 ...... Call Trace: <TASK> fuse_dev_do_read.constprop.0+0xd36/0x1dd0 fs/fuse/dev.c:1334 fuse_dev_read+0x166/0x200 fs/fuse/dev.c:1367 call_read_iter include/linux/fs.h:2104 [inline] new_sync_read fs/read_write.c:395 [inline] vfs_read+0x85b/0xba0 fs/read_write.c:476 ksys_read+0x12f/0x260 fs/read_write.c:619 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xce/0x260 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f ...... </TASK> The warning is due to the FUSE_NOTIFY_RESEND notify sent by the write() syscall in the reproducer program and it happens as follows: (1) calls fuse_dev_read() to read the INIT request The read succeeds. During the read, bit FR_SENT will be set on the request. (2) calls fuse_dev_write() to send an USE_NOTIFY_RESEND notify The resend notify will resend all processing requests, so the INIT request is moved from processing list to pending list again. (3) calls fuse_dev_read() with an invalid output address fuse_dev_read() will try to copy the same INIT request to the output address, but it will fail due to the invalid address, so the INIT request is ended and triggers the warning in fuse_request_end(). Fix it by clearing FR_SENT when re-adding requests into pending list. Acked-by: Miklos Szeredi <mszeredi@redhat.com> Reported-by: xingwei lee <xrivendell7@gmail.com> Reported-by: yue sun <samsun1006219@gmail.com> Closes: https://lore.kernel.org/linux-fsdevel/58f13e47-4765-fce4-daf4-dffcc5ae2330@huaweicloud.com/T/#m091614e5ea2af403b259e7cea6a49e51b9ee07a7 Fixes: `760eac73f9` ("fuse: Introduce a new notification type for resend pending requests") Signed-off-by: Hou Tao <houtao1@huawei.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-05-10 11:10:39 +02:00
Hou Tao	42815f8ac5	fuse: set FR_PENDING atomically in fuse_resend() When fuse_resend() moves the requests from processing lists to pending list, it uses __set_bit() to set FR_PENDING bit in req->flags. Using __set_bit() is not safe, because other functions may update req->flags concurrently (e.g., request_wait_answer() may call set_bit(FR_INTERRUPTED, &flags)). Fix it by using set_bit() instead. Fixes: `760eac73f9` ("fuse: Introduce a new notification type for resend pending requests") Signed-off-by: Hou Tao <houtao1@huawei.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-05-10 11:10:12 +02:00
David Howells	da0e01cc70	afs: Fix fileserver rotation getting stuck Fix the fileserver rotation code in a couple of ways: (1) op->server_states is an array, not a pointer to a single record, so fix the places that access it to index it. (2) In the places that go through an address list to work out which one has the best priority, fix the loops to skip known failed addresses. Without this, the rotation algorithm may get stuck on addresses that are inaccessible or don't respond. This can be triggered manually by finding a server that advertises a non-routable address and giving it a higher priority, eg.: echo "add udp 192.168.0.0/16 3000" >/proc/fs/afs/addr_prefs if the server, say, includes the address 192.168.7.7 in its address list, and then attempting to access a volume on that server. Fixes: `495f2ae9e3` ("afs: Fix fileserver rotation") Signed-off-by: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Link: https://lore.kernel.org/r/4005300.1712309731@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/998836.1714746152@warthog.procyon.org.uk Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-05-10 08:49:17 +02:00
Linus Torvalds	c62b758bae	fcntl: add F_DUPFD_QUERY fcntl() Often userspace needs to know whether two file descriptors refer to the same struct file. For example, systemd uses this to filter out duplicate file descriptors in it's file descriptor store (cf. [1]) and vulkan uses it to compare dma-buf fds (cf. [2]). The only api we provided for this was kcmp() but that's not generally available or might be disallowed because it is way more powerful (allows ordering of file pointers, operates on non-current task) etc. So give userspace a simple way of comparing two file descriptors for sameness adding a new fcntl() F_DUDFD_QUERY. Link: `a4f0e0da35/src/basic/fd-util.c (L517)` [1] Link: https://gitlab.freedesktop.org/wlroots/wlroots/-/blob/master/render/vulkan/texture.c#L490 [2] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> [brauner: commit message] Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-05-10 08:26:31 +02:00
Chao Yu	29ed2b5dd5	f2fs: compress: don't allow unaligned truncation on released compress inode f2fs image may be corrupted after below testcase: - mkfs.f2fs -O extra_attr,compression -f /dev/vdb - mount /dev/vdb /mnt/f2fs - touch /mnt/f2fs/file - f2fs_io setflags compression /mnt/f2fs/file - dd if=/dev/zero of=/mnt/f2fs/file bs=4k count=4 - f2fs_io release_cblocks /mnt/f2fs/file - truncate -s 8192 /mnt/f2fs/file - umount /mnt/f2fs - fsck.f2fs /dev/vdb [ASSERT] (fsck_chk_inode_blk:1256) --> ino: 0x5 has i_blocks: 0x00000002, but has 0x3 blocks [FSCK] valid_block_count matching with CP [Fail] [0x4, 0x5] [FSCK] other corrupted bugs [Fail] The reason is: partial truncation assume compressed inode has reserved blocks, after partial truncation, valid block count may change w/o .i_blocks and .total_valid_block_count update, result in corruption. This patch only allow cluster size aligned truncation on released compress inode for fixing. Fixes: `c61404153e` ("f2fs: introduce FI_COMPRESS_RELEASED instead of using IMMUTABLE bit") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2024-05-10 03:38:28 +00:00
Chao Yu	0fa4e57c1d	f2fs: fix to release node block count in error path of f2fs_new_node_page() It missed to call dec_valid_node_count() to release node block count in error path, fix it. Fixes: `141170b759` ("f2fs: fix to avoid use f2fs_bug_on() in f2fs_new_node_page()") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2024-05-10 03:38:28 +00:00
Chao Yu	0a4ed2d97c	f2fs: compress: fix to cover {reserve,release}_compress_blocks() w/ cp_rwsem lock It needs to cover {reserve,release}_compress_blocks() w/ cp_rwsem lock to avoid racing with checkpoint, otherwise, filesystem metadata including blkaddr in dnode, inode fields and .total_valid_block_count may be corrupted after SPO case. Fixes: `ef8d563f18` ("f2fs: introduce F2FS_IOC_RELEASE_COMPRESS_BLOCKS") Fixes: `c75488fb4d` ("f2fs: introduce F2FS_IOC_RESERVE_COMPRESS_BLOCKS") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2024-05-10 03:38:28 +00:00
Chao Yu	043c832371	f2fs: compress: fix error path of inc_valid_block_count() If inc_valid_block_count() can not allocate all requested blocks, it needs to release block count in .total_valid_block_count and resevation blocks in inode. Fixes: `5460749487` ("f2fs: compress: fix to avoid inconsistence bewteen i_blocks and dnode") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2024-05-10 03:38:28 +00:00
Chao Yu	a3a0bc6c22	f2fs: compress: fix typo in f2fs_reserve_compress_blocks() s/released/reserved. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2024-05-10 03:38:27 +00:00
Chao Yu	186e7d7153	f2fs: compress: fix to update i_compr_blocks correctly Previously, we account reserved blocks and compressed blocks into @compr_blocks, then, f2fs_i_compr_blocks_update(,compr_blocks) will update i_compr_blocks incorrectly, fix it. Meanwhile, for the case all blocks in cluster were reserved, fix to update dn->ofs_in_node correctly. Fixes: `eb8fbaa533` ("f2fs: compress: fix to check unreleased compressed cluster") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2024-05-10 03:38:27 +00:00
Thomas Bertschinger	07f9a27f19	bcachefs: add no_invalid_checks flag Setting this flag on a filesystem results in validity checks being skipped when writing bkeys. This flag will be used by tooling that deliberately injects corruption into a filesystem in order to exercise fsck. It shouldn't be set outside of testing/debugging code. Signed-off-by: Thomas Bertschinger <tahbertschinger@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-09 16:24:30 -04:00
Daniel Hill	bceacfa97e	bcachefs: add counters for failed shrinker reclaim This adds distinct counters for every reason the btree node shrinker can fail to free an object - if our shrinker isn't making progress, this will tell us why. Signed-off-by: Daniel Hill <daniel@gluo.nz> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-09 16:24:29 -04:00
Kent Overstreet	692aa7a54b	bcachefs: Fix sb_field_downgrade validation - bch2_sb_downgrade_validate() wasn't checking for a downgrade entry extending past the end of the superblock section - for_each_downgrade_entry() is used in to_text() and needs to work on malformed input; it also was missing a check for a field extending past the end of the section Reported-by: syzbot+e49ccab73449180bc9be@syzkaller.appspotmail.com Fixes: `84f1638795` ("bcachefs: bch_sb_field_downgrade") Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-09 16:23:36 -04:00
Kent Overstreet	a5c3e265d3	bcachefs: Plumb bch_validate_flags to sb_field_ops.validate() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-09 16:23:36 -04:00
Kent Overstreet	65eaf4e24a	bcachefs: s/bkey_invalid_flags/bch_validate_flags We're about to start using bch_validate_flags for superblock section validation - it's no longer bkey specific. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-09 16:23:36 -04:00
Kent Overstreet	d09a8468d9	bcachefs: fsync() should not return -EROFS fsync has a slightly odd usage of -EROFS, where it means "does not support fsync". I didn't choose it... Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-09 16:23:36 -04:00
Kent Overstreet	99179fb898	bcachefs: Invalid devices are now checked for by fsck, not .invalid methods Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-09 16:23:36 -04:00
Kent Overstreet	2f4b4a3b44	bcachefs: kill bch2_dev_bkey_exists() in bch2_check_fix_ptrs() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-09 16:23:36 -04:00
Kent Overstreet	02b7fa4fe5	bcachefs: kill bch2_dev_bkey_exists() in bch2_read_endio() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-09 16:23:36 -04:00
Kent Overstreet	2c91ab7262	bcachefs: bch2_dev_get_ioref() checks for device not present Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-09 16:23:36 -04:00
Kent Overstreet	465bf6f42a	bcachefs: bch2_dev_get_ioref2(); io_read.c Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-09 16:23:36 -04:00
Kent Overstreet	91ffdecfc7	bcachefs: bch2_dev_get_ioref2(); debug.c Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-09 16:23:35 -04:00
Kent Overstreet	6212ea2497	bcachefs: bch2_dev_get_ioref2(); journal_io.c Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-09 16:23:35 -04:00
Kent Overstreet	48af853925	bcachefs: bch2_dev_get_ioref2(); io_write.c Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-09 16:23:35 -04:00
Kent Overstreet	690f7cdf73	bcachefs: bch2_dev_get_ioref2(); btree_io.c Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-09 16:23:35 -04:00
Kent Overstreet	466298e2f6	bcachefs: bch2_dev_get_ioref2(); backpointers.c Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-09 16:23:35 -04:00
Kent Overstreet	0e57996c69	bcachefs: bch2_dev_get_ioref2(); alloc_background.c Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-09 16:23:35 -04:00
Kent Overstreet	b6fb4269e7	bcachefs: for_each_bset() declares loop iter Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-09 16:23:34 -04:00
Masahiro Yamada	b1992c3772	kbuild: use $(src) instead of $(srctree)/$(src) for source directory Kbuild conventionally uses $(obj)/ for generated files, and $(src)/ for checked-in source files. It is merely a convention without any functional difference. In fact, $(obj) and $(src) are exactly the same, as defined in scripts/Makefile.build: src := $(obj) When the kernel is built in a separate output directory, $(src) does not accurately reflect the source directory location. While Kbuild resolves this discrepancy by specifying VPATH=$(srctree) to search for source files, it does not cover all cases. For example, when adding a header search path for local headers, -I$(srctree)/$(src) is typically passed to the compiler. This introduces inconsistency between upstream and downstream Makefiles because $(src) is used instead of $(srctree)/$(src) for the latter. To address this inconsistency, this commit changes the semantics of $(src) so that it always points to the directory in the source tree. Going forward, the variables used in Makefiles will have the following meanings: $(obj) - directory in the object tree $(src) - directory in the source tree (changed by this commit) $(objtree) - the top of the kernel object tree $(srctree) - the top of the kernel source tree Consequently, $(srctree)/$(src) in upstream Makefiles need to be replaced with $(src). Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: Nicolas Schier <nicolas@fjasle.eu>	2024-05-10 04:34:52 +09:00
Jakub Kicinski	e7073830cc	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR. No conflicts. Adjacent changes: drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c `35d92abfba` ("net: hns3: fix kernel crash when devlink reload during initialization") `2a1a1a7b5f` ("net: hns3: add command queue trace for hns3") Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-05-09 10:01:01 -07:00
Andy Shevchenko	1dd719a959	isofs: Use -y instead of -objs in Makefile -objs suffix is reserved rather for (user-space) host programs while usually -y suffix is used for kernel drivers (although -objs works for that purpose for now). Let's correct the old usages of -objs in Makefiles. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <20240508152129.1445372-1-andriy.shevchenko@linux.intel.com>	2024-05-09 18:09:57 +02:00
Ye Bin	26770a717c	jbd2: add prefix 'jbd2' for 'shrink_type' As 'shrink_type' is exported. The module prefix 'jbd2' is added to distinguish from memory reclamation. Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://lore.kernel.org/r/20240407065355.1528580-3-yebin10@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2024-05-09 10:52:31 -04:00
Ye Bin	078760d950	jbd2: use shrink_type type instead of bool type for __jbd2_journal_clean_checkpoint_list() "enum shrink_type" can clearly express the meaning of the parameter of __jbd2_journal_clean_checkpoint_list(), and there is no need to use the bool type. Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://lore.kernel.org/r/20240407065355.1528580-2-yebin10@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2024-05-09 10:52:31 -04:00
Baokun Li	b4b4fda34e	ext4: fix uninitialized ratelimit_state->lock access in __ext4_fill_super() In the following concurrency we will access the uninitialized rs->lock: ext4_fill_super ext4_register_sysfs // sysfs registered msg_ratelimit_interval_ms // Other processes modify rs->interval to // non-zero via msg_ratelimit_interval_ms ext4_orphan_cleanup ext4_msg(sb, KERN_INFO, "Errors on filesystem, " __ext4_msg ___ratelimit(&(EXT4_SB(sb)->s_msg_ratelimit_state) if (!rs->interval) // do nothing if interval is 0 return 1; raw_spin_trylock_irqsave(&rs->lock, flags) raw_spin_trylock(lock) _raw_spin_trylock __raw_spin_trylock spin_acquire(&lock->dep_map, 0, 1, _RET_IP_) lock_acquire __lock_acquire register_lock_class assign_lock_key dump_stack(); ratelimit_state_init(&sbi->s_msg_ratelimit_state, 5 * HZ, 10); raw_spin_lock_init(&rs->lock); // init rs->lock here and get the following dump_stack: ========================================================= INFO: trying to register non-static key. The code is fine but needs lockdep annotation, or maybe you didn't initialize this object before use? turning off the locking correctness validator. CPU: 12 PID: 753 Comm: mount Tainted: G E 6.7.0-rc6-next-20231222 #504 [...] Call Trace: dump_stack_lvl+0xc5/0x170 dump_stack+0x18/0x30 register_lock_class+0x740/0x7c0 __lock_acquire+0x69/0x13a0 lock_acquire+0x120/0x450 _raw_spin_trylock+0x98/0xd0 ___ratelimit+0xf6/0x220 __ext4_msg+0x7f/0x160 [ext4] ext4_orphan_cleanup+0x665/0x740 [ext4] __ext4_fill_super+0x21ea/0x2b10 [ext4] ext4_fill_super+0x14d/0x360 [ext4] [...] ========================================================= Normally interval is 0 until s_msg_ratelimit_state is initialized, so ___ratelimit() does nothing. But registering sysfs precedes initializing rs->lock, so it is possible to change rs->interval to a non-zero value via the msg_ratelimit_interval_ms interface of sysfs while rs->lock is uninitialized, and then a call to ext4_msg triggers the problem by accessing an uninitialized rs->lock. Therefore register sysfs after all initializations are complete to avoid such problems. Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20240102133730.1098120-1-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2024-05-09 10:40:07 -04:00
Chuck Lever	8d915bbf39	NFSD: Force all NFSv4.2 COPY requests to be synchronous We've discovered that delivering a CB_OFFLOAD operation can be unreliable in some pretty unremarkable situations. Examples include: - The server dropped the connection because it lost a forechannel NFSv4 request and wishes to force the client to retransmit - The GSS sequence number window under-flowed - A network partition occurred When that happens, all pending callback operations, including CB_OFFLOAD, are lost. NFSD does not retransmit them. Moreover, the Linux NFS client does not yet support sending an OFFLOAD_STATUS operation to probe whether an asynchronous COPY operation has finished. Thus, on Linux NFS clients, when a CB_OFFLOAD is lost, asynchronous COPY can hang until manually interrupted. I've tried a couple of remedies, but so far the side-effects are worse than the disease and they have had to be reverted. So temporarily force COPY operations to be synchronous so that the use of CB_OFFLOAD is avoided entirely. This is a fix that can easily be backported to LTS kernels. I am working on client patches that introduce an implementation of OFFLOAD_STATUS. Note that NFSD arbitrarily limits the size of a copy_file_range to 4MB to avoid indefinitely blocking an nfsd thread. A short COPY result is returned in that case, and the client can present a fresh COPY request for the remainder. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>	2024-05-09 09:10:48 -04:00
Matthew Wilcox (Oracle)	ea4fd933ab	ext4: remove calls to to set/clear the folio error flag Nobody checks this flag on ext4 folios, stop setting and clearing it. Cc: Theodore Ts'o <tytso@mit.edu> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: linux-ext4@vger.kernel.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Link: https://lore.kernel.org/r/20240420025029.2166544-11-willy@infradead.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2024-05-09 00:23:51 -04:00
Chao Yu	4ed886b187	f2fs: check validation of fault attrs in f2fs_build_fault_attr() - It missed to check validation of fault attrs in parse_options(), let's fix to add check condition in f2fs_build_fault_attr(). - Use f2fs_build_fault_attr() in __sbi_store() to clean up code. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2024-05-09 01:04:46 +00:00
Chao Yu	c521a6ab4a	f2fs: fix to limit gc_pin_file_threshold type of f2fs_inode.i_gc_failures, f2fs_inode_info.i_gc_failures, and f2fs_sb_info.gc_pin_file_threshold is __le16, unsigned int, and u64, so it will cause truncation during comparison and persistence. Unifying variable of these three variables to unsigned short, and add an upper boundary limitation for gc_pin_file_threshold. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2024-05-09 01:03:44 +00:00
Chao Yu	968c4f72b2	f2fs: remove unused GC_FAILURE_PIN After commit `3db1de0e58` ("f2fs: change the current atomic write way"), we removed all GC_FAILURE_ATOMIC usage, let's change i_gc_failures[] array to i_pin_failure for cleanup. Meanwhile, let's define i_current_depth and i_gc_failures as union variable due to they won't be valid at the same time. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2024-05-09 01:03:17 +00:00
Chao Yu	a78118406d	f2fs: use f2fs_{err,info}_ratelimited() for cleanup Commit `b1c9d3f833` ("f2fs: support printk_ratelimited() in f2fs_printk()") missed some cases, cover all remains for cleanup. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2024-05-09 01:02:42 +00:00
Wu Bo	aa4074e8fe	f2fs: fix block migration when section is not aligned to pow2 As for zoned-UFS, f2fs section size is forced to zone size. And zone size may not aligned to pow2. Fixes: `859fca6b70` ("f2fs: swap: support migrating swapfile in aligned write mode") Signed-off-by: Liao Yuanhong <liaoyuanhong@vivo.com> Signed-off-by: Wu Bo <bo.wu@vivo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2024-05-09 00:43:03 +00:00
Gao Xiang	7c35de4df1	erofs: Zstandard compression support Add Zstandard compression as the 4th supported algorithm since it becomes more popular now and some end users have asked this for quite a while [1][2]. Each EROFS physical cluster contains only one valid standard Zstandard frame as described in [3] so that decompression can be performed on a per-pcluster basis independently. Currently, it just leverages multi-call stream decompression APIs with internal sliding window buffers. One-shot or bufferless decompression could be implemented later for even better performance if needed. [1] https://github.com/erofs/erofs-utils/issues/6 [2] https://lore.kernel.org/r/Y08h+z6CZdnS1XBm@B-P7TQMD6M-0146.lan [3] https://www.rfc-editor.org/rfc/rfc8478.txt Acked-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20240508234453.17896-1-xiang@kernel.org	2024-05-09 07:46:56 +08:00
Petr Vorel	e2f48c4809	bcachefs: Move BCACHEFS_STATFS_MAGIC value to UAPI magic.h Move BCACHEFS_STATFS_MAGIC value to UAPI <linux/magic.h> under BCACHEFS_SUPER_MAGIC definition (use common approach for name) and reuse the definition in bcachefs_format.h BCACHEFS_STATFS_MAGIC. There are other bcachefs magic definitions: BCACHE_MAGIC, BCHFS_MAGIC, which use UUID_INIT() and are used only in libbcachefs. Therefore move only BCACHEFS_STATFS_MAGIC value, which can be used outside of libbcachefs for f_type field in struct statfs in statfs() or fstatfs(). Suggested-by: Su Yue <glass.su@suse.com> Signed-off-by: Petr Vorel <pvorel@suse.cz> Acked-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:24 -04:00
Kent Overstreet	e11ecc6133	bcachefs: Improve sysfs internal/btree_cache Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:24 -04:00
Kent Overstreet	c670509134	bcachefs: Allocator prefers not to expand mi.btree_allocated bitmap We now have a small bitmap in the member info section of the superblock for "regions that have btree nodes", so that if we ever have to scan for btree nodes in repair we don't have to scan the whole device(s). This tweaks the allocator to prefer allocating from regions that are already marked in this bitmap. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:24 -04:00
Kent Overstreet	40574946b8	bcachefs: Better bucket alloc tracepoints Tracepoints are garbage, and perf trace even cuts off some of our fields. Much nicer to just trace a string, and then we can build nicely formatted output with printbufs. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:24 -04:00
Kent Overstreet	6202569777	bcachefs: Move nocow unlock to bch2_write_endio() This fixes a lifetime issue; bch2_nocow_write_unlock() uses PTR_BUCKET_POS(), which needs the device - but we drop our ref to the device in bch2_write_endio(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:24 -04:00
Kent Overstreet	b6d29b5869	bcachefs: kill bch2_dev_bkey_exists() in journal_ptrs_to_text() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:24 -04:00
Kent Overstreet	f4301b635a	bcachefs: kill bch2_dev_bkey_exists() in discard_one_bucket_fast() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:24 -04:00
Kent Overstreet	bc3204c80a	bcachefs: kill bch2_dev_bkey_exists() in check_alloc_info() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:24 -04:00
Kent Overstreet	d8585a79be	bcachefs: bch2_dev_have_ref() bch2_dev_bkey_exists() is going away; bch2_dev_have_ref() documents that we're looking up a device without checking if it's present because we have a reference to it already. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:24 -04:00
Kent Overstreet	222eacabc1	bcachefs: kill bch2_dev_bkey_exists() in data_update_init() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:23 -04:00
Kent Overstreet	a9422fd404	bcachefs: kill bch2_dev_bkey_exists() in bkey_pick_read_device() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:23 -04:00
Kent Overstreet	db39a35dde	bcachefs: pass bch_dev to read_from_stale_dirty_pointer() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:23 -04:00
Kent Overstreet	78e9b548f3	bcachefs: bch2_dev_bucket_exists() uses bch2_dev_rcu() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:23 -04:00
Kent Overstreet	ad897d241b	bcachefs: kill bch2_dev_bkey_exists() in btree_gc.c Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:23 -04:00
Kent Overstreet	9cadb4ea56	bcachefs: bch2_extent_normalize() -> bch2_dev_rcu() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:23 -04:00
Kent Overstreet	8e3cc2003f	bcachefs: bch2_bkey_has_target() -> bch2_dev_rcu() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:23 -04:00
Kent Overstreet	8feecbed24	bcachefs: extent_ptr_invalid() -> bch2_dev_rcu() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:23 -04:00
Kent Overstreet	3858aa4268	bcachefs: ptr_stale() -> dev_ptr_stale() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:23 -04:00
Kent Overstreet	302c980a81	bcachefs: extent_ptr_durability() -> bch2_dev_rcu() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:23 -04:00
Kent Overstreet	3793b3f91f	bcachefs: bch2_extent_merge() -> bch2_dev_rcu() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:23 -04:00
Kent Overstreet	c387d84413	bcachefs: ec_validate_checksums() -> bch2_dev_tryget() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:23 -04:00
Kent Overstreet	8783856ab1	bcachefs: ob_dev() Wrapper around bch2_dev_have_ref() for open_buckets; we do guarantee that the device an open_bucket points to exists. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:23 -04:00
Kent Overstreet	dbd0408087	bcachefs: move replica_set from bch_dev to bch_fs This is needed for the next patch - the write submit path has to be able to allocate a replica bio even when we weren't able to get a ref on the device. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:23 -04:00
Kent Overstreet	633cf06944	bcachefs: Kill bch2_dev_bkey_exists() in backpointer code Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:23 -04:00
Kent Overstreet	1f2f92ec3f	bcachefs: PTR_BUCKET_POS() now takes bch_dev Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:23 -04:00
Kent Overstreet	fa6cce09f0	bcachefs: bch2_dev_iterate() New helper for getting refs to devices as we iterate. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:23 -04:00
Kent Overstreet	cb4d340a10	bcachefs: bch2_evacuate_bucket() -> bch2_dev_tryget() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:23 -04:00
Kent Overstreet	07d7c4da7b	bcachefs: bch2_bucket_ref_update() now takes bch_dev Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:23 -04:00
Kent Overstreet	a7f1c26f59	bcachefs: bch2_trigger_alloc() -> bch2_dev_tryget() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:23 -04:00
Kent Overstreet	9b3059a1b3	bcachefs: bch2_check_alloc_key() -> bch2_dev_tryget_noerror() More elimination of bch2_dev_bkey_exists() usage. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:22 -04:00
Kent Overstreet	4cd91e2f87	bcachefs: Convert to bch2_dev_tryget_noerror() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:22 -04:00
Kent Overstreet	b07eb8252f	bcachefs: bch2_dev_tryget() Most uses of bch2_dev_bkey_exists() are going away, where we assume that because a key references a device the device most exist - instead, we'll be explicitly checking if the device exists and getting a reference to it. This adds the new helpers. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:22 -04:00
Kent Overstreet	6349b07c25	bcachefs: bch2_have_enough_devs() checks for nonexistent device Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:22 -04:00
Kent Overstreet	adf81796ee	bcachefs: journal_replay_entry_early() checks for nonexistent device Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:22 -04:00
Kent Overstreet	13a16dabde	bcachefs: bch2_dev_btree_bitmap_marked() -> bch2_dev_rcu() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:22 -04:00
Kent Overstreet	267039d0fc	bcachefs: Pass device to bch2_bucket_do_index() Eliminating bch2_dev_bkey_exists() uses and replacing them with proper checks; this one was unnecessary since the caller already has it. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:22 -04:00
Kent Overstreet	f5faf43f85	bcachefs: Pass device to bch2_alloc_write_key() More elimating bch2_dev_bkey_exists() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:22 -04:00
Kent Overstreet	23f308ae19	bcachefs: bch2_dev_safe() -> bch2_dev_rcu() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:22 -04:00
Kent Overstreet	552aa54865	bcachefs: Debug asserts for ca->ref Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:22 -04:00
Kent Overstreet	f295298b8c	bcachefs: New helpers for device refcounts This will be used in the next patch for adding some new debug mode asserts. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:22 -04:00
Kent Overstreet	e98786ea85	bcachefs: bch2_print_allocator_stuck() If we block on the allocator for more than 10 seconds, print out some useful debugging info. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:22 -04:00
Kent Overstreet	9a768ab75b	bcachefs: bch2_bkey_drop_ptrs() declares loop iter Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:22 -04:00
Kent Overstreet	b895c70326	bcachefs: x-macroize journal flags enums Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:22 -04:00
Kent Overstreet	3a718c0647	bcachefs: On device add, prefer unused slots We can't strictly guarantee that no pointers refer to nonexistent devices - we attempt to, but we need to be safe when the filesystem is corrupt. Therefore, change device_add to try to pick a slot that's never been used, or the slot that's been unused the longest. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:22 -04:00
Kent Overstreet	ffcbec6076	bcachefs: Kill opts.buckets_nouse Now explicitly allocate and free the buckets_nouse bitmap - this is going to be used for online fsck. To go RW when we haven't check allocations, we'll do a much slimmed down version that just initializes the buckets_nouse bitmaps. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:22 -04:00
Kent Overstreet	abe2f470bc	bcachefs: simplify bch2_trans_start_alloc_update() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:22 -04:00
Kent Overstreet	0acf2169a5	bcachefs: __mark_stripe_bucket() now takes bch_alloc_v4 Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:21 -04:00
Kent Overstreet	be11ae16c4	bcachefs: __mark_pointer now takes bch_alloc_v4 Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:21 -04:00
Kent Overstreet	c02eb9e891	bcachefs: kill bch2_dev_usage_update_m() by using bucket_m_to_alloc() more, we can get some nice code cleanup. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:21 -04:00
Kent Overstreet	fa9bb741fe	bcachefs: alloc_data_type_set() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:21 -04:00
Kent Overstreet	2685c67d12	bcachefs: dirty_sectors -> replicas_sectors Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:21 -04:00
Kent Overstreet	d3c44cfd5e	bcachefs: delete old gen check bch2_alloc_write_key() this was from metadata only gc - we don't need it anymore Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:21 -04:00
Youling Tang	75a53a0a23	bcachefs: Correct the FS_IOC_GETFLAGS to FS_IOC32_GETFLAGS in bch2_compat_fs_ioctl() It should be FS_IOC32_GETFLAGS instead of FS_IOC_GETFLAGS in compat ioctl. Signed-off-by: Youling Tang <tangyouling@kylinos.cn> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:21 -04:00
Youling Tang	9862022d09	bcachefs: Fix error path of bch2_link_trans() In bch2_link_trans(), if bch2_inode_nlink_inc() fails, it needs to call bch2_trans_iter_exit() in the error path. Signed-off-by: Youling Tang <tangyouling@kylinos.cn> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:21 -04:00
Youling Tang	36aa49d33e	bcachefs: Change destroy_inode to free_inode The vfs[1] documentation describes free_inode as follows: ``` free_inode this method is called from RCU callback. If you use call_rcu() in ->destroy_inode to free ‘struct inode’ memory, then it’s better to release memory in this method. ``` free_inode will be called by the RCU callback, so it might be better to move the inode free operation to destroy_inode. Similar to commit `ae6b47b565` ("fs/ntfs3: Change destroy_inode to free_inode"). Link: [1]: https://www.kernel.org/doc/html/latest/filesystems/vfs.html Signed-off-by: Youling Tang <tangyouling@kylinos.cn> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:21 -04:00
Kent Overstreet	c8bda9f20a	bcachefs: Simplify resuming of journal position Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:21 -04:00
Kent Overstreet	83c38e3ef8	bcachefs: check inode backpointer in bch2_lookup() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:21 -04:00
Kent Overstreet	4da1713a8d	bcachefs: check for inodes that should have backpointers in fsck Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:21 -04:00
Kent Overstreet	45150765d3	bcachefs: bch_member.last_journal_bucket On recovery from clean shutdown we don't typically read the journal, but we still want to avoid overwriting existing entries in the journal for list_journal debugging. Thus, add some fields to the member info section so we can remember where we left off. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:21 -04:00
Kent Overstreet	c749541353	bcachefs: uninline set_btree_iter_dontneed() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:21 -04:00
Hongbo Li	0af0b963b5	bcachefs: eliminate the uninitialized compilation warning in bch2_reconstruct_snapshots When compiling the bcachefs-tools, the following compilation warning is reported: libbcachefs/snapshot.c: In function ‘bch2_reconstruct_snapshots’: libbcachefs/snapshot.c:915:19: warning: ‘tree_id’ may be used uninitialized in this function [-Wmaybe-uninitialized] 915 \| snapshot->v.tree = cpu_to_le32(tree_id); libbcachefs/snapshot.c:903:6: note: ‘tree_id’ was declared here 903 \| u32 tree_id; \| ^~~~~~~ This is a false alert, because @tree_id is changed in bch2_snapshot_tree_create after it returns 0. And if this function returns other value, @tree_id wouldn't be used. Thus there should be nothing wrong in logical. Although the report itself is a false alert, we can still make it more explicit by setting the initial value of @tree_id to 0 (an invalid tree ID). Fixes: `a292be3b68` ("bcachefs: Reconstruct missing snapshot nodes") Signed-off-by: Hongbo Li <lihongbo22@huawei.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:21 -04:00
Kent Overstreet	56522d7276	bcachefs: fix btree_path_clone() ip_allocated Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:21 -04:00
Nathan Chancellor	8bb0eddbbc	bcachefs: Fix format specifiers in bch2_btree_key_cache_to_text() When building for a 32-bit target, for which 'size_t' is 'unsigned int', there are two warnings around mismatched format specifiers and argument types: In file included from fs/bcachefs/vstructs.h:5, from fs/bcachefs/bcachefs_format.h:79, from fs/bcachefs/bcachefs.h:207, from fs/bcachefs/btree_key_cache.c:3: fs/bcachefs/btree_key_cache.c: In function 'bch2_btree_key_cache_to_text': fs/bcachefs/btree_key_cache.c:1046:25: error: format '%lu' expects argument of type 'long unsigned int', but argument 3 has type 'size_t' {aka 'unsigned int'} [-Werror=format=] 1046 \| prt_printf(out, "nonpcpu freelist:\t%lu\r\n", bc->nr_freed_nonpcpu); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~ \| \| \| size_t {aka unsigned int} fs/bcachefs/util.h:192:63: note: in definition of macro 'prt_printf' 192 \| #define prt_printf(_out, ...) bch2_prt_printf(_out, __VA_ARGS__) \| ^~~~~~~~~~~ fs/bcachefs/btree_key_cache.c:1046:47: note: format string is defined here 1046 \| prt_printf(out, "nonpcpu freelist:\t%lu\r\n", bc->nr_freed_nonpcpu); \| ~~^ \| \| \| long unsigned int \| %u fs/bcachefs/btree_key_cache.c:1047:25: error: format '%lu' expects argument of type 'long unsigned int', but argument 3 has type 'size_t' {aka 'unsigned int'} [-Werror=format=] 1047 \| prt_printf(out, "pcpu freelist:\t%lu\r\n", bc->nr_freed_pcpu); \| ^~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~ \| \| \| size_t {aka unsigned int} fs/bcachefs/util.h:192:63: note: in definition of macro 'prt_printf' 192 \| #define prt_printf(_out, ...) bch2_prt_printf(_out, __VA_ARGS__) \| ^~~~~~~~~~~ fs/bcachefs/btree_key_cache.c:1047:44: note: format string is defined here 1047 \| prt_printf(out, "pcpu freelist:\t%lu\r\n", bc->nr_freed_pcpu); \| ~~^ \| \| \| long unsigned int \| %u cc1: all warnings being treated as error Use the proper 'size_t' specifier, '%zu', to clear up the warnings for these platforms. Fixes: f2d47ec26af5 ("bcachefs: Btree key cache instrumentation") Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:21 -04:00
Nathan Chancellor	2d288745eb	bcachefs: Fix type of flags parameter for some ->trigger() implementations When building with clang's -Wincompatible-function-pointer-types-strict (a warning designed to catch potential kCFI failures at build time), there are several warnings along the lines of: fs/bcachefs/bkey_methods.c:118:2: error: incompatible function pointer types initializing 'int ()(struct btree_trans , enum btree_id, unsigned int, struct bkey_s_c, struct bkey_s, enum btree_iter_update_trigger_flags)' with an expression of type 'int (struct btree_trans *, enum btree_id, unsigned int, struct bkey_s_c, struct bkey_s, unsigned int)' [-Werror,-Wincompatible-function-pointer-types-strict] 118 \| BCH_BKEY_TYPES() \| ^~~~~~~~~~~~~~~~ fs/bcachefs/bcachefs_format.h:394:2: note: expanded from macro 'BCH_BKEY_TYPES' 394 \| x(inode, 8) \ \| ^~~~~~~~~~~~~~~~~~~~~~~~~~ fs/bcachefs/bkey_methods.c:117:41: note: expanded from macro 'x' 117 \| #define x(name, nr) [KEY_TYPE_##name] = bch2_bkey_ops_##name, \| ^~~~~~~~~~~~~~~~~~~~ <scratch space>:277:1: note: expanded from here 277 \| bch2_bkey_ops_inode \| ^~~~~~~~~~~~~~~~~~~ fs/bcachefs/inode.h:26:13: note: expanded from macro 'bch2_bkey_ops_inode' 26 \| .trigger = bch2_trigger_inode, \ \| ^~~~~~~~~~~~~~~~~~ There are several functions that did not have their flags parameter converted to 'enum btree_iter_update_trigger_flags' in the recent unification, which will cause kCFI failures at runtime because the types, while ABI compatible (hence no warning from the non-strict version of this warning), do not match exactly. Fix up these functions (as well as a few other obvious functions that should have it, even if there are no warnings currently) to resolve the warnings and potential kCFI runtime failures. Fixes: 31e4ef3280c8 ("bcachefs: iter/update/trigger/str_hash flag cleanup") Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:21 -04:00
Kent Overstreet	24b27975a9	bcachefs: Kill gc_init_recurse() This unifies the online and offline btree gc passes; we're not yet running it online. We now iterate over one level of the btree at a time - the same as check_extents_to_backpointers(); this ordering preserves order of keys regardless of btree splits and merges, which will be important when we re-enable online gc. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:21 -04:00
Kent Overstreet	c451986bf4	bcachefs: do reflink_p repair from BTREE_TRIGGER_check_repair Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:20 -04:00
Kent Overstreet	f40d13f94d	bcachefs: Run bch2_check_fix_ptrs() via triggers Currently, the reflink_p gc trigger does repair as well - turning a reflink_p key into an error key if the reflink_v it points to doesn't exist. This won't work with online check/repair, because the repair path once online will be subject to transaction restarts, but BTREE_TRIGGER_gc is not idempotant - we can't run it multiple times if we get a transaction restart. So we need to split these paths; to do so this patch calls check_fix_ptrs() by a new general path - a new trigger type, BTREE_TRIGGER_check_repair. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:20 -04:00
Kent Overstreet	930e1a92d6	bcachefs: kill gc looping for bucket gens looping when we change a bucket gen is not ideal - it means we risk failing if we'd go into an infinite loop, and it's better to make forward progress even if fsck doesn't fix everything. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:20 -04:00
Kent Overstreet	70e3e039cf	bcachefs: bch2_bucket_ref_update() If we hit an inconsistency when updating allocation information, we don't want to fail the update if it's for a deletion - only if it's for a new key. Rename check_bucket_ref() -> bucket_ref_update() so we can centralize the logic to do this. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:20 -04:00
Kent Overstreet	9cc455d1bc	bcachefs: Consolidate mark_stripe_bucket() and trans_mark_stripe_bucket() This eliminates some duplicated logic, and the gc path now handles stripe updates and deletions - we need this since soon we're bringing back runtime gc. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:20 -04:00
Kent Overstreet	d930764650	bcachefs: mark_stripe_bucket cleanup Start to work on unifying mark_stripe_bucket() and trans_mark_stripe_bucket(); first, clean up all the unnecessary and gratuitious differences. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:20 -04:00
Kent Overstreet	c4e8db2b5d	bcachefs: bucket_data_type_mismatch() We're working on potentially unifying bch2_check_bucket_ref() and bch2_check_fix_ptrs() - or at least eliminating gratuitious differences. Most immediately, there's a bunch of cleanups to be done regarding BCH_DATA_stripe. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:20 -04:00
Kent Overstreet	b769590f33	bcachefs: Clean up inode alloc There's no need to be using new_inode(); we can skip all that indirection and make the code easier to follow. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:20 -04:00
Kent Overstreet	f04158290d	bcachefs: journal seq blacklist gc no longer has to walk btree Since btree_ptr_v2, we no longer require the journal seq blacklist table for skipping blacklisted bsets (btree node entries); the pointer to a given node indicates how much data is present. Therefore there's no longer any need for journal seq blacklist gc to walk the btree - we can prune entries older than journal last_seq. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:20 -04:00
Kent Overstreet	e7f63c67fc	bcachefs: plumb data_type into bch2_bucket_alloc_trans() prep work for making the allocator try to keep btree nodes within the existing member info btree allocated bitmap Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:20 -04:00
Kent Overstreet	018b32a63f	bcachefs: Add btree_allocated_bitmap to member_to_text() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:20 -04:00
Kent Overstreet	5147b9ae76	bcachefs: Btree key cache instrumentation It turns out the btree key cache shrinker wasn't actually reclaiming anything, prior to the previous patch. This adds instrumentation so that if we have further issues we can see what's going on. Specifically, sysfs internal/btree_key_cache is greatly expanded with new counters, and the SRCU sequence numbers of the first 10 entries on each pending freelist, and we also add trigger_btree_key_cache_shrink for testing without having to prune all the system caches. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:20 -04:00
Matthew Wilcox (Oracle)	e4f2c4dfee	bcachefs: Remove calls to folio_set_error Common code doesn't test the error flag, so we don't need to set it in bcachefs. We can use folio_end_read() to combine the setting (or not) of the uptodate flag and clearing the lock flag. Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Brian Foster <bfoster@redhat.com> Cc: linux-bcachefs@vger.kernel.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:20 -04:00
Kent Overstreet	103304021e	bcachefs: Move gc of bucket.oldest_gen to workqueue This is a nice cleanup - and we've also been having problems with kthread creation in the mount path. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:20 -04:00
Kent Overstreet	b25fd02ab4	bcachefs: fix flag printing in journal_buf_to_text() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:20 -04:00
Kent Overstreet	aef7eecb57	bcachefs: Sync journal when we complete a recovery pass Make things easier when we're debugging long fsck runs - persist the work that successful recovery passes did. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:20 -04:00
Kent Overstreet	f7643bc974	bcachefs: make btree read errors silent during scan Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:20 -04:00
Kent Overstreet	5a2d15213d	bcachefs: Rip bch2_snapshot_equiv() out of fsck Originally, when deleting snapshots we didn't collapse redundant snapshot nodes; thus, the notion of a class of equivalent snapshot nodes leaked into fsck. Now we do, so snapshot ID equivalence classes are purely local to snapshot deletion. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:20 -04:00
Kent Overstreet	9de40d77f0	bcachefs: Check for writing btree_ptr_v2.sectors_written == 0 Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:20 -04:00
Kent Overstreet	60f2b1bcf5	bcachefs: Add asserts to bch2_dev_btree_bitmap_marked_sectors() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:20 -04:00
Kent Overstreet	427e1bb838	bcachefs: fs_alloc_debug_to_text() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:19 -04:00
Kent Overstreet	feb255537d	bcachefs: assert that online_reserved == 0 on shutdown Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:19 -04:00
Kent Overstreet	fd104e2967	bcachefs: bch2_trans_verify_not_unlocked() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:19 -04:00
Kent Overstreet	e590e4e222	bcachefs: bch2_btree_path_can_relock() With the new assertions, we shouldn't be holding locks when trans->locked is false, thus, we shouldn't use relock when we just want to check if we can relock. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:19 -04:00
Kent Overstreet	650db8a87c	bcachefs: trans->locked Add a field for tracking whether a transaction object holds btree locks, and assertions to verify state. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:19 -04:00
Kent Overstreet	e2e568bd97	bcachefs: bch2_btree_root_alloc_fake_trans() We're starting to be more strict about transaction locked state, and multiple transactions in a task. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:19 -04:00
Kent Overstreet	ca563dccb2	bcachefs: bch2_trans_unlock() must always be followed by relock() or begin() We're about to add new asserts for btree_trans locking consistency, and part of that requires that aren't using the btree_trans while it's unlocked. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:19 -04:00
Kent Overstreet	4984faff5d	bcachefs: Use bch2_btree_path_upgrade() in key cache traverse Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:19 -04:00
Kent Overstreet	5d8c9d9428	bcachefs: bch2_btree_path_upgrade() checks nodes_locked, not uptodate In the key cache fill path, we use path_upgrade() on a path that isn't uptodate yet but should be locked. This change makes bch2_btree_path_upgrade() slightly looser so we can use it in key cache upgrade, instead of the __ version. Also, make the related assert - that path->uptodate implies nodes_locked - slightly clearer. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:19 -04:00
Kent Overstreet	f2d9823f46	bcachefs: maintain lock invariants in btree_iter_next_node() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:19 -04:00
Kent Overstreet	449ceafb49	bcachefs: bch2_trans_commit_flags_to_text() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:19 -04:00
Kent Overstreet	b7f10636d5	bcachefs: prefer drop_locks_do() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:19 -04:00
Kent Overstreet	91b5d97fdf	bcachefs: get_unlocked_mut_path -> bch2_path_get_unlocked_mut Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:19 -04:00
Lukas Bulwahn	d434c2398f	bcachefs: fix typo in reference to BCACHEFS_DEBUG Commit `ec9cc18fc2` ("bcachefs: Add checks for invalid snapshot IDs") intends to check the sanity of a snapshot and panic when BCACHEFS_DEBUG is set, but that conditional has a typo. Fix the typo to refer to the actual existing Kconfig symbol. This was found with ./scripts/checkkconfigsymbols.py. Signed-off-by: Lukas Bulwahn <lukas.bulwahn@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:19 -04:00
Ricardo B. Marliere	af3b39b4c6	bcachefs: chardev: make bch_chardev_class constant Since commit `43a7206b09` ("driver core: class: make class_register() take a const *"), the driver core allows for struct class to be in read-only memory, so move the bch_chardev_class structure to be declared at build time placing it into read-only memory, instead of having to be dynamically allocated at boot time. Also, correctly clean up after failing paths in bch2_chardev_init(). Cc: Hongbo Li <lihongbo22@huawei.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ricardo B. Marliere <ricardo@marliere.net> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:19 -04:00
Kent Overstreet	2f724563fc	bcachefs: member helper cleanups Some renaming for better consistency bch2_member_exists -> bch2_member_alive bch2_dev_exists -> bch2_member_exists bch2_dev_exsits2 -> bch2_dev_exists bch_dev_locked -> bch2_dev_locked bch_dev_bkey_exists -> bch2_dev_bkey_exists new helper - bch2_dev_safe Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:19 -04:00
Kent Overstreet	d155272b6e	bcachefs: bucket_valid() cut out a branch from doing it the obvious way Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:19 -04:00
Kent Overstreet	923ed0ae5e	bcachefs: bch2_trans_relock_fail() - factor out slowpath Factor out slowpath into a separate helper Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:19 -04:00
Kent Overstreet	0c0cbfdb84	bcachefs: bch2_dir_emit() - drop_locks_do() conversion Add a new helper that calls dir_emit() and updates ctx->pos on success; this lets us convert bch2_readdir() to drop_locks_do(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:19 -04:00
Kent Overstreet	65bd442397	bcachefs: bch2_btree_insert_trans() no longer specifies BTREE_ITER_cached Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:18 -04:00
Kent Overstreet	5dd8c60e1e	bcachefs: iter/update/trigger/str_hash flag cleanup Combine iter/update/trigger/str_hash flags into a single enum, and x-macroize them for a to_text() function later. These flags are all for a specific iter/key/update context, so it makes sense to group them together - iter/update/trigger flags were already given distinct bits, this cleans up and unifies that handling. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:18 -04:00
Kent Overstreet	bf5f6a689b	bcachefs: __BTREE_ITER_ALL_SNAPSHOTS -> BTREE_ITER_SNAPSHOT_FIELD Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:18 -04:00
Kent Overstreet	c281db0fa5	bcachefs: mark_superblock cleanup Consolidate mark_superblock() and trans_mark_superblock(), like we did with the other trigger paths. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:18 -04:00
Kent Overstreet	ba665494fb	bcachefs: gc_btree_init_recurse() uses gc_mark_node() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:18 -04:00
Kent Overstreet	d1adfe4e7e	bcachefs: move root node topo checks to node_check_topology() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:18 -04:00
Kent Overstreet	b982d645a4	bcachefs: move topology repair kick to gc_btrees() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:18 -04:00
Kent Overstreet	58dda9c10e	bcachefs: kill metadata only gc Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:18 -04:00
Kent Overstreet	d1b213a00d	bcachefs: Finish converting reconstruct_alloc to errors_silent with errors_silent, reconstruct_alloc no longer requires fsck and fix_errors to work Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:18 -04:00
Kent Overstreet	68e142405c	bcachefs: bch2_gc() is now private to btree_gc.c Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:18 -04:00
Kent Overstreet	665e8b3239	bcachefs: for_each_btree_key_continue() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:18 -04:00
Kent Overstreet	a21107eeb1	bcachefs: kill for_each_btree_key_old() Dead code Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:18 -04:00
Kuan-Wei Chiu	0ddb5f0854	bcachefs: Optimize eytzinger0_sort() with bottom-up heapsort This optimization reduces the average number of comparisons required from 2nlog2(n) - 3n + o(n) to nlog2(n) + 0.37n + o(n). When n is sufficiently large, it results in approximately 50% fewer comparisons. Currently, eytzinger0_sort employs the textbook version of heapsort, where during the heapify process, each level requires two comparisons to determine the maximum among three elements. In contrast, the bottom-up heapsort, during heapify, only compares two children at each level until reaching a leaf node. Then, it backtracks from the leaf node to find the correct position. Since heapify typically continues until very close to the leaf node, the standard heapify requires about 2log2(n) comparisons, while the bottom-up variant only needs log2(n) comparisons. The experimental data presented below is based on an array generated by get_random_u32(). \| N \| comparisons(old) \| comparisons(new) \| time(old) \| time(new) \| \|-------\|------------------\|------------------\|-----------\|-----------\| \| 10000 \| 235381 \| 136615 \| 25545 us \| 20366 us \| \| 20000 \| 510694 \| 293425 \| 31336 us \| 18312 us \| \| 30000 \| 800384 \| 457412 \| 35042 us \| 27386 us \| \| 40000 \| 1101617 \| 626831 \| 48779 us \| 38253 us \| \| 50000 \| 1409762 \| 799637 \| 62238 us \| 46950 us \| \| 60000 \| 1721191 \| 974521 \| 75588 us \| 58367 us \| \| 70000 \| 2038536 \| 1152171 \| 90823 us \| 68778 us \| \| 80000 \| 2362958 \| 1333472 \| 104165 us \| 78625 us \| \| 90000 \| 2690900 \| 1516065 \| 116111 us \| 89573 us \| \| 100000\| 3019413 \| `1699879` \| 133638 us \| 100998 us \| Refs: BOTTOM-UP-HEAPSORT, a new variant of HEAPSORT beating, on an average, QUICKSORT (if n is not very small) Ingo Wegener Theoretical Computer Science, 118(1); Pages 81-98, 13 September 1993 https://doi.org/10.1016/0304-3975(93)90364-Y Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:18 -04:00
Kent Overstreet	be31bf439c	bcachefs: When traversing to interior nodes, propagate result to paths to same leaf node Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:18 -04:00
Kent Overstreet	4dcd90b6d1	bcachefs: Don't read journal just for fsck reading the journal can take a decent amount of time compared to the rest of fsck, let's only read it when required. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:18 -04:00
Kent Overstreet	19391b9294	bcachefs: allow for custom action in fsck error messages Be more explicit to the user about what we're doing. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:18 -04:00
Kent Overstreet	497c982f05	bcachefs: New assertion for writing to the journal after shutdown Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:18 -04:00
Kent Overstreet	00589cadb1	bcachefs: bch2_btree_path_to_text() Long form version of bch2_btree_path_to_text() - useful in error messages and tracepoints. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:18 -04:00
Kent Overstreet	5577881455	bcachefs: add btree_node_merging_disabled debug param Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:18 -04:00
Kent Overstreet	ac01928b8e	bcachefs: bch2_hash_lookup() now returns bkey_s_c small cleanup Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:17 -04:00
Kent Overstreet	6ab71b4a8e	bcachefs: bch2_journal_keys_dump() debug helper Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:17 -04:00
Kent Overstreet	9089376f70	bcachefs: bch2_btree_node_header_to_text() better btree node read path error messages Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:17 -04:00
Kent Overstreet	7423330e30	bcachefs: prt_printf() now respects \r\n\t Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:17 -04:00
Kent Overstreet	2dcb605e86	bcachefs: printbufs: prt_printf() now handles \t\r\n Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:17 -04:00
Kent Overstreet	acce32a51e	bcachefs: printbuf improvements - fix assorted (harmless) off-by-one errors - we were inconsistent on whether out->pos stays <= out->size on overflow; now it does, and printbuf.overflow exists to indicate if a printbuf has overflowed - factor out printbuf_advance_pos() - printbuf_nul_terminate_reserved(); use this to reduce the number of printbuf_make_room() calls Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:17 -04:00
Kent Overstreet	62606398d5	bcachefs: Run upgrade/downgrade even in -o nochanges mode We need to be able to test these paths in dry run mode. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:17 -04:00
Kent Overstreet	6d82869185	bcachefs: Better write_super() error messages When a superblock write is silently dropped or it's been modified by another process we need to know which device it was. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 17:29:17 -04:00
Kent Overstreet	74768337de	bcachefs: Fix xattr_to_text() unsafety Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 14:57:19 -04:00
Kent Overstreet	61692c7812	bcachefs: bch2_bkey_format_field_overflows() Fix another shift-by-64 by factoring out a common helper for bch2_bkey_format_invalid() and bformat_needs_redo() (where it was already fixed). Reported-by: syzbot+9833a1d29d4a44361e2c@syzkaller.appspotmail.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 14:57:19 -04:00
Kent Overstreet	5dfd3746b6	bcachefs: Fix needs_whiteout BUG_ON() in bkey_sort() Btree nodes are log structured; thus, we need to emit whiteouts when we're deleting a key that's been written out to disk. k->needs_whiteout tracks whether a key will need a whiteout when it's deleted, and this requires some careful handling; e.g. the key we're deleting may not have been written out to disk, but it may have overwritten a key that was - thus we need to carry this flag around on overwrites. Invariants: There may be multiple key for the same position in a given node (because of overwrites), but only one of them will be a live (non deleted) key, and only one key for a given position will have the needs_whiteout flag set. Additionally, we don't want to carry around whiteouts that need to be written in the main searchable part of a btree node - btree_iter_peek() will have to skip past them, and this can lead to an O(n^2) issues when doing sequential deletions (e.g. inode rm/truncate). So there's a separate region in the btree node buffer for unwritten whiteouts; these are merge sorted with the rest of the keys we're writing in the btree node write path. The unwritten whiteouts was a later optimization that bch2_sort_keys() didn't take into account; the unwritten whiteouts area means that we never have deleted keys with needs_whiteout set in the main searchable part of a btree node. That means we can simplify and optimize some sort paths, and eliminate an assertion that syzbot found: - Unless we're in the btree node write path, it's always ok to drop whiteouts when sorting - When sorting for a btree node write, we drop the whiteout if it's not from the unwritten whiteouts area, or if it's overwritten by a real key at the same position. This completely eliminates some tricky logic for propagating the needs_whiteout flag: syzbot was able to hit the assertion that checked that there shouldn't be more than one key at the same pos with needs_whiteout set, likely due to a combination of flipping on needs_whiteout on all written keys (they need whiteouts if overwritten), combined with not always dropping unneeded whiteouts, and the tricky logic in the sort path for preserving needs_whiteout that wasn't really needed. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 14:56:09 -04:00
Kent Overstreet	5ad1f33c29	bcachefs: Fix sb_clean_validate endianness conversion Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-05-08 14:56:09 -04:00
Linus Torvalds	45db3ab700	five ksmbd server fixes, all also for stable -----BEGIN PGP SIGNATURE----- iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmY6/AAACgkQiiy9cAdy T1FJIgv/ZUwoOodrFevFTrRFtQLS/ssRxgX69FEIWXpzHqeU8olsC8P2ywM974ba ATsiLmfpdreBilnW5DCHFLtJwPb1py2KzqlwYYbh7sdU3d+mGGLX6r1ucn9tKNl3 fWNCHUe8Dz3qRaKkpNFmzS3sXaekr/ZT3SsoJyYg/d8Z7fqXsTy7auo2pVXRiYFp TacIaGDc2Tw7fyf6Dt9o9YuSCOmGXaj9pUlTHrW17/cYXDMsQD+UcaNu93uuyZjo i6xvN1npZDec3x2j6a0YV159fWfag4hR7GxtwBEg0Ltzm3XSL5v0ljtFpeNGlehg u0TV5Tcfx8pEtcfaFdHbNXC5ih2vDMN2Yts0K8WGAWbcECs+XlvCJnYyvHGFVequ pCZuUGcrXM+0EqYnVTBMdY7lk3We8HbeZsbGjQA23MG9Bd537sBEdGpsA7ya43nJ kFK/ky8PjQ+BFpweGKL27fNULXZTSu+1D+IP+XgqksxKM5LYzWkvLAyVdUy+aNdA 6+MqIZIs =Ee/V -----END PGP SIGNATURE----- Merge tag '6.9-rc7-ksmbd-fixes' of git://git.samba.org/ksmbd Pull smb server fixes from Steve French: "Five ksmbd server fixes, all also for stable - Three fixes related to SMB3 leases (fixes two xfstests, and a locking issue) - Unitialized variable fix - Socket creation fix when bindv6only is set" * tag '6.9-rc7-ksmbd-fixes' of git://git.samba.org/ksmbd: ksmbd: do not grant v2 lease if parent lease key and epoch are not set ksmbd: use rwsem instead of rwlock for lease break ksmbd: avoid to send duplicate lease break notifications ksmbd: off ipv6only for both ipv4/ipv6 binding ksmbd: fix uninitialized symbol 'share' in smb2_tree_connect()	2024-05-08 10:39:53 -07:00
Linus Torvalds	065a057a31	fuse fixes for 6.9 final -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQSQHSd0lITzzeNWNm3h3BK/laaZPAUCZjsr0QAKCRDh3BK/laaZ PLrpAP9Y1Kz3gSSH1wqDJ9+XzQZdm4dSInMP2Pe47BvSGG2YlAEAwmccoyIoiM58 qvHPETImNxIRTAVZdiBM3W4S3hnzCwc= =SPoy -----END PGP SIGNATURE----- Merge tag 'fuse-fixes-6.9-final' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse fixes from Miklos Szeredi: "Two one-liner fixes for issues introduced in -rc1" * tag 'fuse-fixes-6.9-final' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: virtiofs: include a newline in sysfs tag fuse: verify zero padding in fuse_backing_map	2024-05-08 10:33:55 -07:00
Linus Torvalds	fe35bf27a1	Description for this pull request: - Fix xfstests generic/013 test failure with dirsync mount option. - Initialize the reserved fields of deleted file and stream extension dentries to zero. -----BEGIN PGP SIGNATURE----- iQJKBAABCgA0FiEE6NzKS6Uv/XAAGHgyZwv7A1FEIQgFAmY7WcoWHGxpbmtpbmpl b25Aa2VybmVsLm9yZwAKCRBnC/sDUUQhCMxTD/9+qFI6cEfe06Xt6RswN/RDMWrZ ZDzUjT7VATLSyjoiaeyJeCaK9/PCrJuX9+vNybq6W0TqfHzIYDmFn7Wg6HjQrZAJ 0XhiaqVwlQ2/UY4yiv7glJRKFsdgJdo3XhFfTWzV5Eaaj65QFHPjlQMo3tOrZzp9 HsO4+DwIFah2uvehKF8numJBXSZ7uoOELHnlL05A3xSmLAxY+HeueqbkQubv1r11 mIIfvmcdxnXlzdpgs1c+a0KXVg/4/0F+SZKYP+JL5x1N2xpc4y0cWsQgrfXY+7Id fPx6CoRYkchfUFGf/LlX/LKchMO/EuK3q3Q17+zoKfgJgdPbp8TkDpfur9iUOxgy 16wyq/iIPKWEFsMYLtqYN/dlNJ+fmVUVDF457VLNYYEFdDQbp8/VosGn4ct0CBQe E1uzwJlv/iUlBNFX679dNxDewAiBtIat2wyAChCauLK6a1bzHCIDpGUlS88ggBAd OLFvQgzRKILqd8fibb2VV46V/CY3R8SmVCzDBixPFmCJtNZas9crd3UXp1xNvPGA LHDnASkpUHSMQoQN0yfMGfvRosQD7wlJYw1mhMlDq35Z2IJg2HKKSESf2axOc5Z0 25AxNZ8xfgjBNiFfDQI0mClliXnz9GTRGt4LqBVS+YHjdbPYqCHNsvJDbR0r1ZM7 OzYIaxTVoTKtYsurgw== =zS+L -----END PGP SIGNATURE----- Merge tag 'exfat-for-6.9-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat Pull exfat fixes from Namjae Jeon: - Fix xfstests generic/013 test failure with dirsync mount option - Initialize the reserved fields of deleted file and stream extension dentries to zero * tag 'exfat-for-6.9-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat: exfat: zero the reserved fields of file and stream extension dentries exfat: fix timing of synchronizing bitmap and inode	2024-05-08 10:30:13 -07:00
Mateusz Guzik	7f016edaa0	fscrypt: try to avoid refing parent dentry in fscrypt_file_open Merely checking if the directory is encrypted happens for every open when using ext4, at the moment refing and unrefing the parent, costing 2 atomics and serializing opens of different files. The most common case of encryption not being used can be checked for with RCU instead. Sample result from open1_processes -t 20 ("Separate file open/close") from will-it-scale on Sapphire Rapids (ops/s): before: 12539898 after: 25575494 (+103%) v2: - add a comment justifying rcu usage, submitted by Eric Biggers - whack spurious IS_ENCRYPTED check from the refed case Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Link: https://lore.kernel.org/r/20240508081400.422212-1-mjguzik@gmail.com Signed-off-by: Eric Biggers <ebiggers@google.com>	2024-05-08 10:28:58 -07:00
Linus Torvalds	f5fcbc8b43	bcachefs fixes for 6.9 - Various syzbot fixes; mainly small gaps in validation - Fix an integer overflow in fiemap() which was preventing filefrag from returning the full list of extents - Fix a refcounting bug on the device refcount, turned up by new assertions in the development branch - Fix a device removal/readd bug; write_super() was repeatedly dropping and retaking bch_dev->io_ref references -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEKnAFLkS8Qha+jvQrE6szbY3KbnYFAmY6Qz8ACgkQE6szbY3K bnZLRA/9F5dEcNF8mSuVZqJNgzzFAXgL59GZuRMMz5ECJQRTXyHB2c3N2MG6Htxg bJuyT47icTWibIqUrNJubkCaCl9MV9sl3uPRfxF9tVbiOYQg5WhE0UUhprJq8zfi YZ+wlAdPQhPHBgieycF9LIiIzjEGZcYpg8NgCFtdaU9Rxk3aBYyBuD051imvMBqH x0JEibtrIp26u6FScuH5FH5Ro+ysXgw8HZdM0j/9I9WIiYgpya6EbbVqeuXzL3Di scj4vQwA1YoVDw9eUUgXNJq+xD9m6YqJv395imDDWN7sFQm+jGNCossvj0qUKi8m 7QVup6zaO7yNYFJy84/iZCnSC/C7zs1iFJUM6gidRRArkjr7Qw8KAPtIXGRVUM2M 9ogY6Af5u8ie7qVV1NhcULIhCjiOSINUw9uJGYUwv+XtcCfjZb7maBwfvtnFa9VV kQXeoJ/dqVXqCpvnqjQbVej+I8SXCc/s9EPXD2+SHkHzDKAuJkWKzPkQGL3kTSRT 8FPfusF0NDYLTJPOh4MdzuK79YGRQvrPaRv/JyhSAyWsUubACkwmLCyZQUNVAV/f 6WaFoEYCv4coASQNsVnmISPlsoKbLwOtEZDBr14uY9CArKSsCW26QJOKyg4B7tF8 J2DU6sIy+Tzq+TiTkWV5IE/ibQijOIB2/06+5KcM7npsFJldHWs= =rlea -----END PGP SIGNATURE----- Merge tag 'bcachefs-2024-05-07.2' of https://evilpiepirate.org/git/bcachefs Pull bcachefs fixes from Kent Overstreet: - Various syzbot fixes; mainly small gaps in validation - Fix an integer overflow in fiemap() which was preventing filefrag from returning the full list of extents - Fix a refcounting bug on the device refcount, turned up by new assertions in the development branch - Fix a device removal/readd bug; write_super() was repeatedly dropping and retaking bch_dev->io_ref references * tag 'bcachefs-2024-05-07.2' of https://evilpiepirate.org/git/bcachefs: bcachefs: Add missing sched_annotate_sleep() in bch2_journal_flush_seq_async() bcachefs: Fix race in bch2_write_super() bcachefs: BCH_SB_LAYOUT_SIZE_BITS_MAX bcachefs: Add missing skcipher_request_set_callback() call bcachefs: Fix snapshot_t() usage in bch2_fs_quota_read_inode() bcachefs: Fix shift-by-64 in bformat_needs_redo() bcachefs: Guard against unknown k.k->type in __bkey_invalid() bcachefs: Add missing validation for superblock section clean bcachefs: Fix assert in bch2_alloc_v4_invalid() bcachefs: fix overflow in fiemap bcachefs: Add a better limit for maximum number of buckets bcachefs: Fix lifetime issue in device iterator helpers bcachefs: Fix bch2_dev_lookup() refcounting bcachefs: Initialize bch_write_op->failed in inline data path bcachefs: Fix refcount put in sb_field_resize error path bcachefs: Inodes need extra padding for varint_decode_fast() bcachefs: Fix early error path in bch2_fs_btree_key_cache_exit() bcachefs: bucket_pos_to_bp_noerror() bcachefs: don't free error pointers bcachefs: Fix a scheduler splat in __bch2_next_write_buffer_flush_journal_buf()	2024-05-08 10:23:18 -07:00
Allen Pais	4bbf9c3b53	fs/coredump: Enable dynamic configuration of max file note size Introduce the capability to dynamically configure the maximum file note size for ELF core dumps via sysctl. Why is this being done? We have observed that during a crash when there are more than 65k mmaps in memory, the existing fixed limit on the size of the ELF notes section becomes a bottleneck. The notes section quickly reaches its capacity, leading to incomplete memory segment information in the resulting coredump. This truncation compromises the utility of the coredumps, as crucial information about the memory state at the time of the crash might be omitted. This enhancement removes the previous static limit of 4MB, allowing system administrators to adjust the size based on system-specific requirements or constraints. Eg: $ sysctl -a \| grep core_file_note_size_limit kernel.core_file_note_size_limit = 4194304 $ sysctl -n kernel.core_file_note_size_limit 4194304 $echo 519304 > /proc/sys/kernel/core_file_note_size_limit $sysctl -n kernel.core_file_note_size_limit 519304 Attempting to write beyond the ceiling value of 16MB $echo 17194304 > /proc/sys/kernel/core_file_note_size_limit bash: echo: write error: Invalid argument Signed-off-by: Vijay Nag <nagvijay@microsoft.com> Signed-off-by: Allen Pais <apais@linux.microsoft.com> Link: https://lore.kernel.org/r/20240506193700.7884-1-apais@linux.microsoft.com Signed-off-by: Kees Cook <keescook@chromium.org>	2024-05-08 09:53:00 -07:00
Ryusuke Konishi	91d743a9c8	nilfs2: make superblock data array index computation sparse friendly Upon running sparse, "warning: dubious: x & !y" is output at an array index calculation within nilfs_load_super_block(). The calculation is not wrong, but to eliminate the sparse warning, replace it with an equivalent calculation. Also, add a comment to make it easier to understand what the unintuitive array index calculation is doing and whether it's correct. Link: https://lkml.kernel.org/r/20240430080019.4242-3-konishi.ryusuke@gmail.com Fixes: `e339ad31f5` ("nilfs2: introduce secondary super block") Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: Bart Van Assche <bvanassche@acm.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: kernel test robot <lkp@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2024-05-08 08:41:28 -07:00
Matthew Wilcox (Oracle)	bbf45b7e68	squashfs: remove calls to set the folio error flag Nobody checks the error flag on squashfs folios, so stop setting it. Link: https://lkml.kernel.org/r/20240420025029.2166544-24-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Tested-by: Phillip Lougher <phillip@squashfs.org.uk> Reviewed-by: Phillip Lougher <phillip@squashfs.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2024-05-08 08:41:28 -07:00
Matthew Wilcox (Oracle)	675f02e5e6	squashfs: convert squashfs_symlink_read_folio to use folio APIs Remove use of page APIs, return the errno instead of 0, switch from kmap_atomic to kmap_local and use folio_end_read() to unify the two exit paths. Link: https://lkml.kernel.org/r/20240420025029.2166544-23-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Tested-by: Phillip Lougher <phillip@squashfs.org.uk> Reviewed-by: Phillip Lougher <phillip@squashfs.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2024-05-08 08:41:28 -07:00
Colin Ian King	f492fb3656	ocfs2: remove redundant assignment to variable status Variable status is being assigned and error code that is never read, it is being assigned inside of a do-while loop. The assignment is redundant and can be removed. Cleans up clang scan build warning: fs/ocfs2/dlm/dlmdomain.c:1530:2: warning: Value stored to 'status' is never read [deadcode.DeadStores] Link: https://lkml.kernel.org/r/20240423223018.1573213-1-colin.i.king@gmail.com Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Cc: Heming Zhao <heming.zhao@suse.com> Cc: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2024-05-08 08:41:27 -07:00
Eric Sandeen	36defdd9d7	nilfs2: convert to use the new mount API Convert nilfs2 to use the new mount API. [sandeen@redhat.com: v2] Link: https://lkml.kernel.org/r/33d078a7-9072-4d8e-a3a9-dec23d4191da@redhat.com Link: https://lkml.kernel.org/r/20240425190526.10905-1-konishi.ryusuke@gmail.com [konishi.ryusuke: fixed missing SB_RDONLY flag repair in nilfs_reconfigure] Link: https://lkml.kernel.org/r/33d078a7-9072-4d8e-a3a9-dec23d4191da@redhat.com Link: https://lkml.kernel.org/r/20240424182716.6024-1-konishi.ryusuke@gmail.com Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2024-05-08 08:41:27 -07:00
Gao Xiang	d69189428d	erofs: clean up z_erofs_load_full_lcluster() Only four lcluster types here, remove redundant code. No real logic changes. Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20240508123357.3266173-1-hsiangkao@linux.alibaba.com	2024-05-08 20:36:43 +08:00
Hongzhen Luo	1872df8dcd	erofs: derive fsid from on-disk UUID for .statfs() if possible Use the superblock's UUID to generate the fsid when it's non-null. Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com> Signed-off-by: Hongzhen Luo <hongzhen@linux.alibaba.com> Link: https://lore.kernel.org/r/20240409113022.74720-1-hongzhen@linux.alibaba.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>	2024-05-08 17:12:51 +08:00
Chunhai Guo	0f6273ab46	erofs: add a reserved buffer pool for lz4 decompression This adds a special global buffer pool (in the end) for reserved pages. Using a reserved pool for LZ4 decompression significantly reduces the time spent on extra temporary page allocation for the extreme cases in low memory scenarios. The table below shows the reduction in time spent on page allocation for LZ4 decompression when using a reserved pool. The results were obtained from multi-app launch benchmarks on ARM64 Android devices running the 5.15 kernel with an 8-core CPU and 8GB of memory. In the benchmark, we launched 16 frequently-used apps, and the camera app was the last one in each round. The data in the table is the average time of camera app for each round. After using the reserved pool, there was an average improvement of 150ms in the overall launch time of our camera app, which was obtained from the systrace log. +--------------+---------------+--------------+---------+ \| \| w/o page pool \| w/ page pool \| diff \| +--------------+---------------+--------------+---------+ \| Average (ms) \| 3434 \| 21 \| -99.38% \| +--------------+---------------+--------------+---------+ Based on the benchmark logs, 64 pages are sufficient for 95% of scenarios. This value can be adjusted with a module parameter `reserved_pages`. The default value is 0. This pool is currently only used for the LZ4 decompressor, but it can be applied to more decompressors if needed. Signed-off-by: Chunhai Guo <guochunhai@vivo.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20240402131523.2703948-1-guochunhai@vivo.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>	2024-05-08 17:12:51 +08:00
Chunhai Guo	d6db47e571	erofs: do not use pagepool in z_erofs_gbuf_growsize() Let's use alloc_pages_bulk_array() for simplicity and get rid of unnecessary pagepool. Signed-off-by: Chunhai Guo <guochunhai@vivo.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20240402092757.2635257-1-guochunhai@vivo.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>	2024-05-08 17:12:50 +08:00
Chunhai Guo	f36f3010f6	erofs: rename per-CPU buffers to global buffer pool and make it configurable It will cost more time if compressed buffers are allocated on demand for low-latency algorithms (like lz4) so EROFS uses per-CPU buffers to keep compressed data if in-place decompression is unfulfilled. While it is kind of wasteful of memory for a device with hundreds of CPUs, and only a small number of CPUs concurrently decompress most of the time. This patch renames it as 'global buffer pool' and makes it configurable. This allows two or more CPUs to share a common buffer to reduce memory occupation. Suggested-by: Gao Xiang <xiang@kernel.org> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Signed-off-by: Chunhai Guo <guochunhai@vivo.com> Link: https://lore.kernel.org/r/20240402100036.2673604-1-guochunhai@vivo.com Signed-off-by: Sandeep Dhavale <dhavale@google.com> Link: https://lore.kernel.org/r/20240408215231.3376659-1-dhavale@google.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>	2024-05-08 17:12:49 +08:00
Chunhai Guo	cacd5b04e2	erofs: rename utils.c to zutil.c Currently, utils.c is only useful if CONFIG_EROFS_FS_ZIP is on. So let's rename it to zutil.c as well as avoid its inclusion if CONFIG_EROFS_FS_ZIP is explicitly disabled. Signed-off-by: Chunhai Guo <guochunhai@vivo.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20240401135550.2550043-1-guochunhai@vivo.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>	2024-05-08 17:12:49 +08:00
Richard Fung	9fe2a036a2	fuse: Add initial support for fs-verity This adds support for the FS_IOC_ENABLE_VERITY and FS_IOC_MEASURE_VERITY ioctls. The FS_IOC_READ_VERITY_METADATA is missing but from the documentation, "This is a fairly specialized use case, and most fs-verity users won’t need this ioctl." Signed-off-by: Richard Fung <richardfung@google.com> Acked-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-05-08 09:31:21 +02:00
Brian Foster	96d88f65ad	virtiofs: include a newline in sysfs tag The internal tag string doesn't contain a newline. Append one when emitting the tag via sysfs. [Stefan] Orthogonal to the newline issue, sysfs_emit(buf, "%s", fs->tag) is needed to prevent format string injection. Signed-off-by: Brian Foster <bfoster@redhat.com> Fixes: `a8f62f50b4` ("virtiofs: export filesystem tags through sysfs") Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-05-08 09:31:21 +02:00
Matthew Wilcox (Oracle)	413e8f014c	fuse: Convert fuse_readpages_end() to use folio_end_read() Nobody checks the error flag on fuse folios, so stop setting it. Optimise the (optional) setting of the uptodate flag and clearing of the lock flag by using folio_end_read(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2024-05-08 09:31:21 +02:00
Baokun Li	dc1c4663bc	ext4: propagate errors from ext4_sb_bread() in ext4_xattr_block_cache_find() In ext4_xattr_block_cache_find(), when ext4_sb_bread() returns an error, we will either continue to find the next ea block or return NULL to try to insert a new ea block. But whether ext4_sb_bread() returns -EIO or -ENOMEM, the next operation is most likely to fail with the same error. So propagate the error returned by ext4_sb_bread() to make ext4_xattr_block_set() fail to reduce pointless operations. Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20240504075526.2254349-3-libaokun@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2024-05-07 15:59:18 -04:00
Baokun Li	0c0b4a49d3	ext4: fix mb_cache_entry's e_refcnt leak in ext4_xattr_block_cache_find() Syzbot reports a warning as follows: ============================================ WARNING: CPU: 0 PID: 5075 at fs/mbcache.c:419 mb_cache_destroy+0x224/0x290 Modules linked in: CPU: 0 PID: 5075 Comm: syz-executor199 Not tainted 6.9.0-rc6-gb947cc5bf6d7 RIP: 0010:mb_cache_destroy+0x224/0x290 fs/mbcache.c:419 Call Trace: <TASK> ext4_put_super+0x6d4/0xcd0 fs/ext4/super.c:1375 generic_shutdown_super+0x136/0x2d0 fs/super.c:641 kill_block_super+0x44/0x90 fs/super.c:1675 ext4_kill_sb+0x68/0xa0 fs/ext4/super.c:7327 [...] ============================================ This is because when finding an entry in ext4_xattr_block_cache_find(), if ext4_sb_bread() returns -ENOMEM, the ce's e_refcnt, which has already grown in the __entry_find(), won't be put away, and eventually trigger the above issue in mb_cache_destroy() due to reference count leakage. So call mb_cache_entry_put() on the -ENOMEM error branch as a quick fix. Reported-by: syzbot+dd43bd0f7474512edc47@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=dd43bd0f7474512edc47 Fixes: `fb265c9cb4` ("ext4: add ext4_sb_bread() to disambiguate ENOMEM cases") Cc: stable@kernel.org Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20240504075526.2254349-2-libaokun@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2024-05-07 15:59:18 -04:00
Colin Ian King	8b57de1c5e	jbd2: remove redundant assignement to variable err The variable err is being assigned a value that is never read, it is being re-assigned inside the following while loop and also after the while loop. The assignment is redundant and can be removed. Cleans up clang scan build warning: fs/jbd2/commit.c:574:2: warning: Value stored to 'err' is never read [deadcode.DeadStores] Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20240410112803.232993-1-colin.i.king@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2024-05-07 15:52:20 -04:00
Zhang Yi	df0b5afc62	ext4: remove the redundant folio_wait_stable() __filemap_get_folio() with FGP_WRITEBEGIN parameter has already wait for stable folio, so remove the redundant folio_wait_stable() in ext4_da_write_begin(), it was left over from the commit `cc883236b7` ("ext4: drop unnecessary journal handle in delalloc write") that removed the retry getting page logic. Fixes: `cc883236b7` ("ext4: drop unnecessary journal handle in delalloc write") Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20240419023005.2719050-1-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2024-05-07 15:48:04 -04:00
Dan Carpenter	3f4830abd2	ext4: fix potential unnitialized variable Smatch complains "err" can be uninitialized in the caller. fs/ext4/indirect.c:349 ext4_alloc_branch() error: uninitialized symbol 'err'. Set the error to zero on the success path. Fixes: `8016e29f43` ("ext4: fast commit recovery path") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Link: https://lore.kernel.org/r/363a4673-0fb8-4adf-b4fb-90a499077276@moroto.mountain Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2024-05-07 15:44:40 -04:00
Matthew Wilcox (Oracle)	c84f1510fb	ext4: convert ac_buddy_page to ac_buddy_folio This just carries around the bd_buddy_folio so should also be a folio. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Link: https://lore.kernel.org/r/20240416172900.244637-6-willy@infradead.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2024-05-07 15:38:17 -04:00
Matthew Wilcox (Oracle)	ccedf35b5d	ext4: convert ac_bitmap_page to ac_bitmap_folio This just carries around the bd_bitmap_folio so should also be a folio. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Link: https://lore.kernel.org/r/20240416172900.244637-5-willy@infradead.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2024-05-07 15:38:14 -04:00
Matthew Wilcox (Oracle)	e1622a0d55	ext4: convert ext4_mb_init_cache() to take a folio All callers now have a folio, so convert this function from operating on a page to operating on a folio. The folio is assumed to be a single page. Signe-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Link: https://lore.kernel.org/r/20240416172900.244637-4-willy@infradead.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2024-05-07 15:38:10 -04:00
Matthew Wilcox (Oracle)	5eea586b47	ext4: convert bd_buddy_page to bd_buddy_folio There is no need to make this a multi-page folio, so leave all the infrastructure around it in pages. But since we're locking it, playing with its refcount and checking whether it's uptodate, it needs to move to the folio API. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Link: https://lore.kernel.org/r/20240416172900.244637-3-willy@infradead.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2024-05-07 15:38:07 -04:00
Matthew Wilcox (Oracle)	99b150d84e	ext4: convert bd_bitmap_page to bd_bitmap_folio There is no need to make this a multi-page folio, so leave all the infrastructure around it in pages. But since we're locking it, playing with its refcount and checking whether it's uptodate, it needs to move to the folio API. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Link: https://lore.kernel.org/r/20240416172900.244637-2-willy@infradead.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2024-05-07 15:37:46 -04:00
Dan Carpenter	0e39c9e524	btrfs: qgroup: fix initialization of auto inherit array The "i++" was accidentally left out so it just sets qgids[0] over and over. This can lead to unexpected problems, as the groups[1:] would be all 0, leading to later find_qgroup_rb() unable to find a qgroup and cause snapshot creation failure. Fixes: `5343cd9364` ("btrfs: qgroup: simple quota auto hierarchy for nested subvolumes") CC: stable@vger.kernel.org # 6.7+ Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-05-07 21:31:11 +02:00
Matthew Wilcox (Oracle)	bc00965dbf	btrfs: count super block write errors in device instead of tracking folio error state Currently the error status of super block write is tracked in page/folio status bit Error. For that we need to keep the reference for the whole duration of write and wait. Count the number of superblock writeback errors in the btrfs_device. That means we don't need the folio to stay around until it's waited for, and can avoid the extra call to folio_get/put. Also remove a mention of PageError in a comment as it's the last mention of the page Error state. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-05-07 21:31:11 +02:00
Matthew Wilcox (Oracle)	617fb10ea8	btrfs: use the folio iterator in btrfs_end_super_write() Iterate over folios instead of bvecs. Switch the order of unlock and put to be the usual order; we know this folio can't be put until it's been waited for, but that's fragile. Remove the calls to ClearPageUptodate / SetPageUptodate -- if PAGE_SIZE is larger than BTRFS_SUPER_INFO_SIZE, we'd be marking the entire folio uptodate without having actually initialised all the bytes in the page. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-05-07 21:31:10 +02:00
Matthew Wilcox (Oracle)	f93ee0df51	btrfs: convert super block writes to folio in write_dev_supers() This is a direct conversion from pages to folios, assuming single page folio. Also removes some calls to obsolete APIs and some hidden calls to compound_head(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-05-07 21:31:10 +02:00
Matthew Wilcox (Oracle)	c94b7349b8	btrfs: convert super block writes to folio in wait_dev_supers() This is a direct conversion from pages to folios, assuming single page folio. Also removes a few calls to compound_head() and calls to obsolete APIs. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-05-07 21:31:10 +02:00
Thorsten Blum	58a774ca16	btrfs: remove duplicate included header from fs.h Remove duplicate included header file linux/blkdev.h . Signed-off-by: Thorsten Blum <thorsten.blum@toblux.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-05-07 21:31:10 +02:00
Josef Bacik	6b0a63a4fa	btrfs: add a cached state to extent_clear_unlock_delalloc Now that we have the lock_extent tightly coupled with extent_clear_unlock_delalloc we can add a cached state to extent_clear_unlock_delalloc and benefit from skipping the extra lookup when we're doing cow. Reviewed-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-05-07 21:31:10 +02:00
Josef Bacik	8325f41a56	btrfs: push extent lock down in submit_one_async_extent We don't need to include the time we spend in the allocator under our extent lock protection, move it after the allocator and make sure we lock the extent in the error case to ensure we're not clearing these bits without the extent lock held. Reviewed-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-05-07 21:31:10 +02:00
Josef Bacik	d456c25dbb	btrfs: push lock_extent down in cow_file_range() Now that we've got the extent lock pushed into cow_file_range() we can push it further down into the allocation loop. This allows us to only hold the extent lock during the dropping of the extent map range and inserting the ordered extent. This makes the error case a little trickier as we'll now have to lock the range before clearing any of the other extent bits for the range, but this is the error path so is less performance critical. Reviewed-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-05-07 21:31:10 +02:00
Josef Bacik	cd241a8f55	btrfs: move can_cow_file_range_inline() outside of the extent lock These checks aren't reliant on the extent lock. Move this up into cow_file_range_inline(), and then update encoded writes to call this check before calling __cow_file_range_inline(). This will allow us to skip the extent lock if we're not able to inline the given extent. Reviewed-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-05-07 21:31:10 +02:00
Josef Bacik	0ab540995a	btrfs: push lock_extent into cow_file_range_inline Now that we've pushed the lock_extent() into cow_file_range() we can push the extent locking into cow_file_range_inline() and move the lock_extent in cow_file_range() to after we call cow_file_range_inline(). Reviewed-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-05-07 21:31:10 +02:00
Josef Bacik	a0766d8f35	btrfs: push extent lock into cow_file_range Now that cow_file_range is the only function that is called with the range locked, push this call into cow_file_range so we can further narrow the scope. Reviewed-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-05-07 21:31:10 +02:00
Josef Bacik	00009d7bcb	btrfs: push extent lock into run_delalloc_cow This is used by zoned but also as the fallback for uncompressed extents when we fail to compress the ranges. Push the extent lock into run_dealloc_cow(), and adjust the compression case to take the extent lock after calling run_delalloc_cow(). Reviewed-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-05-07 21:31:09 +02:00
Josef Bacik	0e128d4e41	btrfs: remove unlock_extent from run_delalloc_compressed Since we immediately unlock the extent range when we enter run_delalloc_compressed() simply move the lock_extent() down to cover cow_file_range() and then remove the unlock_extent() from run_delalloc_compressed. Reviewed-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-05-07 21:31:09 +02:00
Josef Bacik	aa56b0aa91	btrfs: push extent lock down in run_delalloc_nocow run_delalloc_nocow is a little special because we use the file extents to see if we can nocow a range. We don't actually need the protection of the extent lock to look at the file extents at this point however. We are currently holding the page lock for this range, so we are protected from anybody who would simultaneously be modifying the file extent items for this range. * mmap() - we're holding the page lock. * buffered writes - we're holding the page lock. * direct writes - we're holding the page lock and direct IO has to flush page cache before it's able to continue. * fallocate() - all callers flush the range and wait on ordered extents while holding the inode lock and the mmap lock, so we are again saved by the page lock. We want to use the extent lock to protect 1) The mapping tree for the given range. 2) The ordered extents for the given range. 3) The io_tree for the given range. Push the extent lock down to cover these operations. In the fallback_to_cow() case we simply lock before doing anything and rely on the cow_file_range() helper to handle it's range properly. Reviewed-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-05-07 21:31:09 +02:00
Josef Bacik	0ed30c17f6	btrfs: adjust while loop condition in run_delalloc_nocow We have the following pattern while (1) { if (cur_offset > end) break; } Which is just while (cur_offset <= end) { ... } so adjust the code to be more clear. Reviewed-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-05-07 21:31:09 +02:00
Josef Bacik	7c9acd440f	btrfs: push extent lock into run_delalloc_nocow run_delalloc_nocow is a bit special as it walks through the file extents for the inode and determines what it can nocow and what it can't. This is the more complicated area for extent locking, so start with this function. Reviewed-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-05-07 21:31:09 +02:00
Josef Bacik	c0707c9e1e	btrfs: push the extent lock into btrfs_run_delalloc_range We want to limit the scope of the extent lock to be around operations that can change in flight. Currently we hold the extent lock through the entire writepage operation, which isn't really necessary. We want to protect to make sure nobody has updated DELALLOC. In find_lock_delalloc_range we must lock the range in order to validate the contents of our io_tree. However once we've done that we're safe to unlock the range and continue, as we have the page lock already held for the range. We are protected from all operations at this point. * mmap() - we're holding the page lock, thus are protected. * buffered writes - again, we're protected because we take the page lock for the first and last page in our range for buffered writes so we won't create new delalloc ranges in this area. * direct IO - we invalidate pagecache before attempting to write a new area, which requires the page lock, so again are protected once we're holding the page lock on this range. Additionally this behavior actually already exists for compressed, we unlock the range as soon as we start to process the async extents, and re-lock it during compression. So this is completely safe, and makes the locking more consistent. Make this simple by just pushing the extent lock into btrfs_run_delalloc_range. From there followup patches will push the lock further down into its users. Reviewed-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-05-07 21:31:09 +02:00
Josef Bacik	7034674b8a	btrfs: lock extent when doing inline extent in compression We currently don't lock the extent when we're doing a cow_file_range_inline() for a compressed extent. This isn't a problem necessarily, but it's inconsistent with the rest of our usage of cow_file_range_inline(). This also leads to some extra weird logic around whether the extent is locked or not. Fix this to lock the extent before calling cow_file_range_inline() in compression to make it consistent with the rest of the inline users. In future patches this will be pushed down into the cow_file_range_inline() helper, so we're fine with the quick and dirty locking here. This patch exists to make the behavior change obvious. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-05-07 21:31:09 +02:00
Josef Bacik	0586d0a89e	btrfs: move extent bit and page cleanup into cow_file_range_inline We duplicate the extent cleanup for cow_file_range_inline() in the cow and compressed case. The encoded case doesn't need to do cleanup the same way, so rename cow_file_range_inline to __cow_file_range_inline and then make cow_file_range_inline handle the extent cleanup appropriately, and update the callers. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-05-07 21:31:09 +02:00
Josef Bacik	0332967b4d	btrfs: unlock all the pages with successful inline extent creation Since `4750af3bbe` ("btrfs: prevent extent_clear_unlock_delalloc() to unlock page not locked by __process_pages_contig()") we have been unlocking the locked page manually instead of via extent_clear_unlock_delalloc() because of subpage blocksize support. However we actually disable inline extent creation for subpage blocksize support, so this behavior isn't necessary. Remove this code and comment, if at some point the subpage blocksize code grows support for inline extents this can be re-evaluated. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-05-07 21:31:09 +02:00
Josef Bacik	6eecfa2240	btrfs: push all inline logic into cow_file_range Currently we have a lot of duplicated checks of if (start == 0 && fs_info->sectorsize == PAGE_SIZE) cow_file_range_inline(); Instead of duplicating this check everywhere, consolidate all of the inline extent logic into a helper which documents all of the checks and then use that helper inside of cow_file_range_inline(). With this we can clean up all of the calls to either unconditionally call cow_file_range_inline(), or at least reduce the checks we're doing before we call cow_file_range_inline(); Reviewed-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-05-07 21:31:09 +02:00
Josef Bacik	aa5ccf2917	btrfs: handle errors in btrfs_reloc_clone_csums properly In the cow path we will clone the reloc csums for relocated data extents, and if there's an error we already have an ordered extent and rely on the ordered extent finishing to clean everything up. There's a problem however, we don't mark the ordered extent with an error, we pretend like everything was just fine. If we were at the end of our range we won't actually bubble up this error anywhere, and we could end up inserting an extent that doesn't have csums where it should have them. Fix this by adding a helper to mark the ordered extent with an error, and then use this when we fail to lookup the csums in btrfs_reloc_clone_csums. Use this helper in the other place where we use the same pattern while we're here. This will prevent us from erroneously inserting the extent that doesn't have the required checksums. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-05-07 21:31:09 +02:00
Qu Wenruo	e98bf64f7a	btrfs: add extra sanity checks for create_io_em() The function create_io_em() is called before we submit an IO, to update the in-memory extent map for the involved range. This patch changes the following aspects: - Does not allow BTRFS_ORDERED_NOCOW type For real NOCOW (excluding NOCOW writes into preallocated ranges) writes, we never call create_io_em(), as we does not need to update the extent map at all. So remove the sanity check allowing BTRFS_ORDERED_NOCOW type. - Add extra sanity checks * PREALLOC - @block_len == len For uncompressed writes. * REGULAR - @block_len == @orig_block_len == @ram_bytes == @len We're creating a new uncompressed extent, and referring all of it. - @orig_start == @start We haven no offset inside the extent. * COMPRESSED - valid @compress_type - @len <= @ram_bytes This is to co-operate with encoded writes, which can cause a new file extent referring only part of a uncompressed extent. Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-05-07 21:31:08 +02:00
Qu Wenruo	4bdc558bf9	btrfs: simplify the inline extent map creation With the tree-checker ensuring all inline file extents starts at file offset 0 and has a length no larger than sectorsize, we can simplify the calculation to assigned those fixes values directly. Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-05-07 21:31:08 +02:00
Qu Wenruo	319d91ee72	btrfs: add extra comments on extent_map members The extent_map structure is very critical to btrfs, as it is involved for both read and write paths. Unfortunately the structure is not properly explained, making it pretty hard to understand nor to do further improvement. This patch adds extra comments explaining the major members based on my code reading. Hopefully we can find more members to cleanup in the future. Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-05-07 21:31:08 +02:00
Naohiro Aota	30704a0d56	btrfs: drop unused argument of calcu_metadata_size() calcu_metadata_size() has a "reserve" argument, but the only caller always set it to "1". The other usage (reserve = 0) is dropped by a commit `0647bf564f` ("Btrfs: improve forever loop when doing balance relocation"), which is more than 10 years ago. Drop the argument and simplify the code. Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-05-07 21:31:08 +02:00

... 3 4 5 6 7 ...

91808 Commits