linux

iv/linux

Author	SHA1	Message	Date
Kent Overstreet	0124f42da7	bcachefs: Don't pass memcmp() as a pointer Some (buggy!) compilers have issues with this. Fixes: https://github.com/koverstreet/bcachefs/issues/625 Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-21 13:27:04 -05:00
Kent Overstreet	57f2d20976	bcachefs: Reduce would_deadlock restarts We don't have to take locks in any particular ordering - we'll make forward progress just fine - but if we try to stick to an ordering, it can help to avoid excessive would_deadlock transaction restarts. This tweaks the reflink path to take extents btree locks in the right order. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-21 06:01:45 -05:00
Kent Overstreet	5b14ce35af	bcachefs: bch2_trans_account_disk_usage_change() The disk space accounting rewrite is splitting out accounting for each replicas set - those are moving to btree keys, instead of percpu counters. This breaks bch2_trans_fs_usage_apply() up, splitting out the part we will still need. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-21 06:01:45 -05:00
Kent Overstreet	8e7834a883	bcachefs: bch_fs_usage_base Split out base filesystem usage into its own type; prep work for breaking up bch2_trans_fs_usage_apply(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-21 06:01:45 -05:00
Kent Overstreet	4f564f4f9f	bcachefs: bch2_prt_compression_type() bounds checking helper, since compression types are extensible Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-21 06:01:45 -05:00
Kent Overstreet	e58f963cec	bcachefs: helpers for printing data types We need bounds checking since new versions may introduce new data types. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-21 06:01:45 -05:00
Kent Overstreet	38c23fb809	bcachefs: BTREE_TRIGGER_ATOMIC Add a new flag to be explicit about when we're running atomic triggers. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-21 06:01:45 -05:00
Kent Overstreet	9d5dba2ba8	bcachefs: drop to_text code for obsolete bps in alloc keys Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-21 06:01:45 -05:00
Kent Overstreet	3fe8a18640	bcachefs: eytzinger_for_each() declares loop iter Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-21 06:01:45 -05:00
Kent Overstreet	4ecad0da9d	bcachefs: Don't log errors if BCH_WRITE_ALLOC_NOWAIT Previously, we added logging in the write path to ensure that any unexpected errors getting reported to userspace have a log message; but BCH_WRITE_ALLOC_NOWAIT is a special case, it's used for promotes where errors are expected and not reported out to userspace - so we need to silence those. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-21 06:01:45 -05:00
Su Yue	e240c1b363	bcachefs: fix memleak in bch2_split_devs The pointer dev_name can be modified by strseq(), then causes the memleak: unreferenced object 0xffff9d08a2916c80 (size 32): comm "mount.bcachefs", pid 9090, jiffies 4295856224 (age 17.564s) hex dump (first 32 bytes): 2f 64 65 76 2f 6d 61 70 70 65 72 2f 74 65 73 74 /dev/mapper/test 2d 30 00 00 00 00 00 00 00 00 00 00 00 00 00 00 -0.............. backtrace: [<00000000c5d3be7d>] __kmem_cache_alloc_node+0x1f3/0x2c0 [<0000000052215d26>] __kmalloc_node_track_caller+0x51/0x150 [<0000000069fea956>] kstrdup+0x32/0x60 [<000000000877fcf1>] bch2_split_devs+0x3f/0x150 [bcachefs] [<000000007ee93204>] bch2_mount+0xcb/0x640 [bcachefs] [<000000002dd1e04b>] legacy_get_tree+0x30/0x60 [<000000006afc31d3>] vfs_get_tree+0x28/0xf0 [<000000007b0c538e>] path_mount+0x475/0xb60 [<0000000092de5882>] __x64_sys_mount+0x105/0x140 [<0000000054fc05d8>] do_syscall_64+0x42/0xf0 [<00000000df584910>] entry_SYSCALL_64_after_hwframe+0x6e/0x76 Fix it by copy pointer dev_name at beginning and free the copied pointer at end. Signed-off-by: Su Yue <glass.su@suse.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-21 06:01:45 -05:00
Kees Cook	e28b035958	bcachefs: Replace strlcpy() with strscpy() strlcpy() reads the entire source buffer first. This read may exceed the destination size limit. This is both inefficient and can lead to linear read overflows if a source string is not NUL-terminated[1]. Additionally, it returns the size of the source string, not the resulting size of the destination string. In an effort to remove strlcpy() completely[2], replace strlcpy() here with strscpy(). Nothing checks the return value here, so a direct replacement with strspy() is possible. Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strlcpy [1] Link: https://github.com/KSPP/linux/issues/89 [2] Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Brian Foster <bfoster@redhat.com> Cc: <linux-bcachefs@vger.kernel.org> Link: https://lore.kernel.org/r/20240110235438.work.385-kees@kernel.org Signed-off-by: Kees Cook <keescook@chromium.org>	2024-01-18 12:29:21 -08:00
Linus Torvalds	f16ab99c2e	fix buggered locking in bch2_ioctl_subvolume_destroy() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCZaDougAKCRBZ7Krx/gZQ 60eJAQCtXa908kOFDjSSTetU6aBzWKcCCHszirjhXiTFJv1jTgD/TbvyGs4ku7Ri oI4nh1XX4QMVWsup1VETnnLAjt6DhAw= =fror -----END PGP SIGNATURE----- Merge tag 'pull-bcachefs-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull bcachefs locking fix from Al Viro: "Fix broken locking in bch2_ioctl_subvolume_destroy()" * tag 'pull-bcachefs-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: bch2_ioctl_subvolume_destroy(): fix locking new helper: user_path_locked_at()	2024-01-12 18:04:01 -08:00
Linus Torvalds	999a36b52b	bcachefs updates for 6.8: - btree write buffer rewrite: instead of adding keys to the btree write buffer at transaction commit time, we know journal them with a different journal entry type and copy them from the journal to the write buffer just prior to journal write. This reduces the number of atomic operations on shared cachelines in the transaction commit path and is a signicant performance improvement on some workloads: multithreaded 4k random writes went from ~650k iops to ~850k iops. - Bring back optimistic spinning for six locks: the new implementation doesn't use osq locks; instead we add to the lock waitlist as normal, and then spin on the lock_acquired bit in the waitlist entry, _not_ the lock itself. - BCH_IOCTL_DEV_USAGE_V2, which allows for new data types - BCH_IOCTL_OFFLINE_FSCK, which runs the kernel implementation of fsck but without mounting: useful for transparently using the kernel version of fsck from 'bcachefs fsck' when the kernel version is a better match for the on disk filesystem. - BCH_IOCTL_ONLINE_FSCK: online fsck. Not all passes are supported yet, but the passes that are supported are fully featured - errors may be corrected as normal. The new ioctls use the new 'thread_with_file' abstraction for kicking off a kthread that's tied to a file descriptor returned to userspace via the ioctl. - btree_paths within a btree_trans are now dynamically growable, instead of being limited to 64. This is important for the check_directory_structure phase of fsck, and also fixes some issues we were having with btree path overflow in the reflink btree. - Trigger refactoring; prep work for the upcoming disk space accounting rewrite - Numerous bugfixes :) -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEKnAFLkS8Qha+jvQrE6szbY3KbnYFAmWe8PUACgkQE6szbY3K bnYw6g/9GAXfIGasTZZwK2XEr36RYtEFYMwd/m9V1ET0DH6d/MFH9G7tTYl52AQ4 k9cDFb0d2qdtNk2Rlml1lHFrxMzkp2Q7j9S4YcETrE+/Dir8ODVcJXrGeNTCMGmz B+C12mTOpWrzGMrioRgFZjWAnacsY3RP8NFRTT9HIJHO9UCP+xN5y++sX10C5Gwv 7UVWTaUwjkgdYWkR8RCKGXuG5cNNlRp4Y0eeK2XruG1iI9VAilir1glcD/YMOY8M vECQzmf2ZLGFS/tpnmqVhNbNwVWpTQMYassvKaisWNHLDUgskOoF8YfoYSH27t7F GBb1154O2ga6ea866677FDeNVlg386mGCTUy2xOhMpDL3zW+/Is+8MdfJI4MJP5R EwcjHnn2bk0C2kULbAohw0gnU42FulfvsLNnrfxCeygmZrDoOOCL1HpvnBG4vskc Fp6NK83l974QnyLdPsjr1yB2d2pgb+uMP1v76IukQi0IjNSAyvwSa5nloPTHRzpC j6e2cFpdtX+6vEu6KngXVKTblSEnwhVBTaTR37Lr8PX1sZqFS/+mjRDgg3HZa/GI u0fC0mQyVL9KjDs5LJGpTc/qs8J4mpoS5+dfzn38MI76dFxd5TYZKWVfILTrOtDF ugDnoLkMuYFdueKI2M3YzxXyaA7HBT+7McAdENuJJzJnEuSAZs0= =JvA2 -----END PGP SIGNATURE----- Merge tag 'bcachefs-2024-01-10' of https://evilpiepirate.org/git/bcachefs Pull bcachefs updates from Kent Overstreet: - btree write buffer rewrite: instead of adding keys to the btree write buffer at transaction commit time, we now journal them with a different journal entry type and copy them from the journal to the write buffer just prior to journal write. This reduces the number of atomic operations on shared cachelines in the transaction commit path and is a signicant performance improvement on some workloads: multithreaded 4k random writes went from ~650k iops to ~850k iops. - Bring back optimistic spinning for six locks: the new implementation doesn't use osq locks; instead we add to the lock waitlist as normal, and then spin on the lock_acquired bit in the waitlist entry, _not_ the lock itself. - New ioctls: - BCH_IOCTL_DEV_USAGE_V2, which allows for new data types - BCH_IOCTL_OFFLINE_FSCK, which runs the kernel implementation of fsck but without mounting: useful for transparently using the kernel version of fsck from 'bcachefs fsck' when the kernel version is a better match for the on disk filesystem. - BCH_IOCTL_ONLINE_FSCK: online fsck. Not all passes are supported yet, but the passes that are supported are fully featured - errors may be corrected as normal. The new ioctls use the new 'thread_with_file' abstraction for kicking off a kthread that's tied to a file descriptor returned to userspace via the ioctl. - btree_paths within a btree_trans are now dynamically growable, instead of being limited to 64. This is important for the check_directory_structure phase of fsck, and also fixes some issues we were having with btree path overflow in the reflink btree. - Trigger refactoring; prep work for the upcoming disk space accounting rewrite - Numerous bugfixes :) * tag 'bcachefs-2024-01-10' of https://evilpiepirate.org/git/bcachefs: (226 commits) bcachefs: eytzinger0_find() search should be const bcachefs: move "ptrs not changing" optimization to bch2_trigger_extent() bcachefs: fix simulateously upgrading & downgrading bcachefs: Restart recovery passes more reliably bcachefs: bch2_dump_bset() doesn't choke on u64s == 0 bcachefs: improve checksum error messages bcachefs: improve validate_bset_keys() bcachefs: print sb magic when relevant bcachefs: __bch2_sb_field_to_text() bcachefs: %pg is banished bcachefs: Improve would_deadlock trace event bcachefs: fsck_err()s don't need to manually check c->sb.version anymore bcachefs: Upgrades now specify errors to fix, like downgrades bcachefs: no thread_with_file in userspace bcachefs: Don't autofix errors we can't fix bcachefs: add missing bch2_latency_acct() call bcachefs: increase max_active on io_complete_wq bcachefs: add time_stats for btree_node_read_done() bcachefs: don't clear accessed bit in btree node fill bcachefs: Add an option to control btree node prefetching ...	2024-01-10 16:34:17 -08:00
Linus Torvalds	fb46e22a9e	Many singleton patches against the MM code. The patch series which are included in this merge do the following: - Peng Zhang has done some mapletree maintainance work in the series "maple_tree: add mt_free_one() and mt_attr() helpers" "Some cleanups of maple tree" - In the series "mm: use memmap_on_memory semantics for dax/kmem" Vishal Verma has altered the interworking between memory-hotplug and dax/kmem so that newly added 'device memory' can more easily have its memmap placed within that newly added memory. - Matthew Wilcox continues folio-related work (including a few fixes) in the patch series "Add folio_zero_tail() and folio_fill_tail()" "Make folio_start_writeback return void" "Fix fault handler's handling of poisoned tail pages" "Convert aops->error_remove_page to ->error_remove_folio" "Finish two folio conversions" "More swap folio conversions" - Kefeng Wang has also contributed folio-related work in the series "mm: cleanup and use more folio in page fault" - Jim Cromie has improved the kmemleak reporting output in the series "tweak kmemleak report format". - In the series "stackdepot: allow evicting stack traces" Andrey Konovalov to permits clients (in this case KASAN) to cause eviction of no longer needed stack traces. - Charan Teja Kalla has fixed some accounting issues in the page allocator's atomic reserve calculations in the series "mm: page_alloc: fixes for high atomic reserve caluculations". - Dmitry Rokosov has added to the samples/ dorectory some sample code for a userspace memcg event listener application. See the series "samples: introduce cgroup events listeners". - Some mapletree maintanance work from Liam Howlett in the series "maple_tree: iterator state changes". - Nhat Pham has improved zswap's approach to writeback in the series "workload-specific and memory pressure-driven zswap writeback". - DAMON/DAMOS feature and maintenance work from SeongJae Park in the series "mm/damon: let users feed and tame/auto-tune DAMOS" "selftests/damon: add Python-written DAMON functionality tests" "mm/damon: misc updates for 6.8" - Yosry Ahmed has improved memcg's stats flushing in the series "mm: memcg: subtree stats flushing and thresholds". - In the series "Multi-size THP for anonymous memory" Ryan Roberts has added a runtime opt-in feature to transparent hugepages which improves performance by allocating larger chunks of memory during anonymous page faults. - Matthew Wilcox has also contributed some cleanup and maintenance work against eh buffer_head code int he series "More buffer_head cleanups". - Suren Baghdasaryan has done work on Andrea Arcangeli's series "userfaultfd move option". UFFDIO_MOVE permits userspace heap compaction algorithms to move userspace's pages around rather than UFFDIO_COPY'a alloc/copy/free. - Stefan Roesch has developed a "KSM Advisor", in the series "mm/ksm: Add ksm advisor". This is a governor which tunes KSM's scanning aggressiveness in response to userspace's current needs. - Chengming Zhou has optimized zswap's temporary working memory use in the series "mm/zswap: dstmem reuse optimizations and cleanups". - Matthew Wilcox has performed some maintenance work on the writeback code, both code and within filesystems. The series is "Clean up the writeback paths". - Andrey Konovalov has optimized KASAN's handling of alloc and free stack traces for secondary-level allocators, in the series "kasan: save mempool stack traces". - Andrey also performed some KASAN maintenance work in the series "kasan: assorted clean-ups". - David Hildenbrand has gone to town on the rmap code. Cleanups, more pte batching, folio conversions and more. See the series "mm/rmap: interface overhaul". - Kinsey Ho has contributed some maintenance work on the MGLRU code in the series "mm/mglru: Kconfig cleanup". - Matthew Wilcox has contributed lruvec page accounting code cleanups in the series "Remove some lruvec page accounting functions". -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZZyF2wAKCRDdBJ7gKXxA jjWjAP42LHvGSjp5M+Rs2rKFL0daBQsrlvy6/jCHUequSdWjSgEAmOx7bc5fbF27 Oa8+DxGM9C+fwqZ/7YxU2w/WuUmLPgU= =0NHs -----END PGP SIGNATURE----- Merge tag 'mm-stable-2024-01-08-15-31' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: "Many singleton patches against the MM code. The patch series which are included in this merge do the following: - Peng Zhang has done some mapletree maintainance work in the series 'maple_tree: add mt_free_one() and mt_attr() helpers' 'Some cleanups of maple tree' - In the series 'mm: use memmap_on_memory semantics for dax/kmem' Vishal Verma has altered the interworking between memory-hotplug and dax/kmem so that newly added 'device memory' can more easily have its memmap placed within that newly added memory. - Matthew Wilcox continues folio-related work (including a few fixes) in the patch series 'Add folio_zero_tail() and folio_fill_tail()' 'Make folio_start_writeback return void' 'Fix fault handler's handling of poisoned tail pages' 'Convert aops->error_remove_page to ->error_remove_folio' 'Finish two folio conversions' 'More swap folio conversions' - Kefeng Wang has also contributed folio-related work in the series 'mm: cleanup and use more folio in page fault' - Jim Cromie has improved the kmemleak reporting output in the series 'tweak kmemleak report format'. - In the series 'stackdepot: allow evicting stack traces' Andrey Konovalov to permits clients (in this case KASAN) to cause eviction of no longer needed stack traces. - Charan Teja Kalla has fixed some accounting issues in the page allocator's atomic reserve calculations in the series 'mm: page_alloc: fixes for high atomic reserve caluculations'. - Dmitry Rokosov has added to the samples/ dorectory some sample code for a userspace memcg event listener application. See the series 'samples: introduce cgroup events listeners'. - Some mapletree maintanance work from Liam Howlett in the series 'maple_tree: iterator state changes'. - Nhat Pham has improved zswap's approach to writeback in the series 'workload-specific and memory pressure-driven zswap writeback'. - DAMON/DAMOS feature and maintenance work from SeongJae Park in the series 'mm/damon: let users feed and tame/auto-tune DAMOS' 'selftests/damon: add Python-written DAMON functionality tests' 'mm/damon: misc updates for 6.8' - Yosry Ahmed has improved memcg's stats flushing in the series 'mm: memcg: subtree stats flushing and thresholds'. - In the series 'Multi-size THP for anonymous memory' Ryan Roberts has added a runtime opt-in feature to transparent hugepages which improves performance by allocating larger chunks of memory during anonymous page faults. - Matthew Wilcox has also contributed some cleanup and maintenance work against eh buffer_head code int he series 'More buffer_head cleanups'. - Suren Baghdasaryan has done work on Andrea Arcangeli's series 'userfaultfd move option'. UFFDIO_MOVE permits userspace heap compaction algorithms to move userspace's pages around rather than UFFDIO_COPY'a alloc/copy/free. - Stefan Roesch has developed a 'KSM Advisor', in the series 'mm/ksm: Add ksm advisor'. This is a governor which tunes KSM's scanning aggressiveness in response to userspace's current needs. - Chengming Zhou has optimized zswap's temporary working memory use in the series 'mm/zswap: dstmem reuse optimizations and cleanups'. - Matthew Wilcox has performed some maintenance work on the writeback code, both code and within filesystems. The series is 'Clean up the writeback paths'. - Andrey Konovalov has optimized KASAN's handling of alloc and free stack traces for secondary-level allocators, in the series 'kasan: save mempool stack traces'. - Andrey also performed some KASAN maintenance work in the series 'kasan: assorted clean-ups'. - David Hildenbrand has gone to town on the rmap code. Cleanups, more pte batching, folio conversions and more. See the series 'mm/rmap: interface overhaul'. - Kinsey Ho has contributed some maintenance work on the MGLRU code in the series 'mm/mglru: Kconfig cleanup'. - Matthew Wilcox has contributed lruvec page accounting code cleanups in the series 'Remove some lruvec page accounting functions'" * tag 'mm-stable-2024-01-08-15-31' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (361 commits) mm, treewide: rename MAX_ORDER to MAX_PAGE_ORDER mm, treewide: introduce NR_PAGE_ORDERS selftests/mm: add separate UFFDIO_MOVE test for PMD splitting selftests/mm: skip test if application doesn't has root privileges selftests/mm: conform test to TAP format output selftests: mm: hugepage-mmap: conform to TAP format output selftests/mm: gup_test: conform test to TAP format output mm/selftests: hugepage-mremap: conform test to TAP format output mm/vmstat: move pgdemote_* out of CONFIG_NUMA_BALANCING mm: zsmalloc: return -ENOSPC rather than -EINVAL in zs_malloc while size is too large mm/memcontrol: remove __mod_lruvec_page_state() mm/khugepaged: use a folio more in collapse_file() slub: use a folio in __kmalloc_large_node slub: use folio APIs in free_large_kmalloc() slub: use alloc_pages_node() in alloc_slab_page() mm: remove inc/dec lruvec page state functions mm: ratelimit stat flush from workingset shrinker kasan: stop leaking stack trace handles mm/mglru: remove CONFIG_TRANSPARENT_HUGEPAGE mm/mglru: add dummy pmd_dirty() ...	2024-01-09 11:18:47 -08:00
Linus Torvalds	3f6984e730	vfs-6.8.super -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZZUx4wAKCRCRxhvAZXjc osaNAQC/c+xXVfiq/pFbuK9MQLna4RGZaGcG9k312YniXbHq0AD9HAf4aPcZwPy1 /wkD4pauj3UZ3f0xBSyazGBvAXyN0Qc= =iFAQ -----END PGP SIGNATURE----- Merge tag 'vfs-6.8.super' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs super updates from Christian Brauner: "This contains the super work for this cycle including the long-awaited series by Jan to make it possible to prevent writing to mounted block devices: - Writing to mounted devices is dangerous and can lead to filesystem corruption as well as crashes. Furthermore syzbot comes with more and more involved examples how to corrupt block device under a mounted filesystem leading to kernel crashes and reports we can do nothing about. Add tracking of writers to each block device and a kernel cmdline argument which controls whether other writeable opens to block devices open with BLK_OPEN_RESTRICT_WRITES flag are allowed. Note that this effectively only prevents modification of the particular block device's page cache by other writers. The actual device content can still be modified by other means - e.g. by issuing direct scsi commands, by doing writes through devices lower in the storage stack (e.g. in case loop devices, DM, or MD are involved) etc. But blocking direct modifications of the block device page cache is enough to give filesystems a chance to perform data validation when loading data from the underlying storage and thus prevent kernel crashes. Syzbot can use this cmdline argument option to avoid uninteresting crashes. Also users whose userspace setup does not need writing to mounted block devices can set this option for hardening. We expect that this will be interesting to quite a few workloads. Btrfs is currently opted out of this because they still haven't merged patches we require for this to work from three kernel releases ago. - Reimplement block device freezing and thawing as holder operations on the block device. This allows us to extend block device freezing to all devices associated with a superblock and not just the main device. It also allows us to remove get_active_super() and thus another function that scans the global list of superblocks. Freezing via additional block devices only works if the filesystem chooses to use @fs_holder_ops for these additional devices as well. That currently only includes ext4 and xfs. Earlier releases switched get_tree_bdev() and mount_bdev() to use @fs_holder_ops. The remaining nilfs2 open-coded version of mount_bdev() has been converted to rely on @fs_holder_ops as well. So block device freezing for the main block device will continue to work as before. There should be no regressions in functionality. The only special case is btrfs where block device freezing for the main block device never worked because sb->s_bdev isn't set. Block device freezing for btrfs can be fixed once they can switch to @fs_holder_ops but that can happen whenever they're ready" * tag 'vfs-6.8.super' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (27 commits) block: Fix a memory leak in bdev_open_by_dev() super: don't bother with WARN_ON_ONCE() super: massage wait event mechanism ext4: Block writes to journal device xfs: Block writes to log device fs: Block writes to mounted block devices btrfs: Do not restrict writes to btrfs devices block: Add config option to not allow writing to mounted devices block: Remove blkdev_get_by_*() functions bcachefs: Convert to bdev_open_by_path() fs: handle freezing from multiple devices fs: remove dead check nilfs2: simplify device handling fs: streamline thaw_super_locked ext4: simplify device handling xfs: simplify device handling fs: simplify setup_bdev_super() calls blkdev: comment fs_holder_ops porting: document block device freeze and thaw changes fs: remove unused helper ...	2024-01-08 10:43:51 -08:00
Kent Overstreet	169de41985	bcachefs: eytzinger0_find() search should be const Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-05 23:24:46 -05:00
Kent Overstreet	f5d4481c3e	bcachefs: move "ptrs not changing" optimization to bch2_trigger_extent() This is useful for btree ptrs as well, when we're just updating sectors_written. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-05 23:24:46 -05:00
Kent Overstreet	e7999235e6	bcachefs: fix simulateously upgrading & downgrading Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-05 23:24:21 -05:00
Kent Overstreet	72e2c920e4	bcachefs: Restart recovery passes more reliably Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-05 23:24:21 -05:00
Kent Overstreet	d04d272743	bcachefs: bch2_dump_bset() doesn't choke on u64s == 0 Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-05 23:24:21 -05:00
Kent Overstreet	4819b66e29	bcachefs: improve checksum error messages new helpers: - bch2_csum_to_text() - bch2_csum_err_msg() standardize our checksum error messages a bit, and print out the checksums a bit more nicely. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-05 23:24:21 -05:00
Kent Overstreet	2d02bfb01b	bcachefs: improve validate_bset_keys() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-05 23:24:21 -05:00
Kent Overstreet	5e448c4893	bcachefs: print sb magic when relevant Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-05 23:24:21 -05:00
Kent Overstreet	5b88365660	bcachefs: __bch2_sb_field_to_text() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-05 23:24:21 -05:00
Kent Overstreet	1f5af5fc17	bcachefs: %pg is banished not portable to userspace Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-05 23:24:21 -05:00
Kent Overstreet	c13fbb7de2	bcachefs: Improve would_deadlock trace event We now include backtraces for every thread involved in the cycle. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-05 23:24:21 -05:00
Kent Overstreet	074cbcdaee	bcachefs: fsck_err()s don't need to manually check c->sb.version anymore Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-05 23:24:21 -05:00
Kent Overstreet	15eaaa4c31	bcachefs: Upgrades now specify errors to fix, like downgrades Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-05 23:24:20 -05:00
Kent Overstreet	d641d4cae7	bcachefs: no thread_with_file in userspace Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-05 23:24:20 -05:00
Kent Overstreet	a64a37338d	bcachefs: Don't autofix errors we can't fix Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-05 23:24:20 -05:00
Kent Overstreet	e9bc59f9df	bcachefs: add missing bch2_latency_acct() call Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-05 23:24:20 -05:00
Kent Overstreet	4798bd2443	bcachefs: increase max_active on io_complete_wq this definitely should _not_ be 1, and we don't actually want any concurrency limiting at all here - btree node read completions are getting blocked behind btree node write submissions. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-05 23:24:20 -05:00
Kent Overstreet	c72e4d7a30	bcachefs: add time_stats for btree_node_read_done() Seeing weird latency issues in the btree node read path - add one bch2_btree_node_read_done(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-05 23:24:20 -05:00
Kent Overstreet	b819f30855	bcachefs: don't clear accessed bit in btree node fill Seeing strange performance issues that might be caused by memory pressure causing prefetched nodes to be evicted before they're used. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-05 23:24:20 -05:00
Kent Overstreet	49a5192c0e	bcachefs: Add an option to control btree node prefetching Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-05 23:24:20 -05:00
Kent Overstreet	8a0dda6fd6	bcachefs: kill useless return ret Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-05 23:24:20 -05:00
Kent Overstreet	f0431c5f47	bcachefs: Combine .trans_trigger, .atomic_trigger Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-05 23:24:20 -05:00
Kent Overstreet	4f9ec59f8f	bcachefs: unify extent trigger Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-05 23:24:20 -05:00
Kent Overstreet	5a82ec3fea	bcachefs: bch2_trigger_stripe_ptr() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-05 23:24:20 -05:00
Kent Overstreet	d55ddf6e7a	bcachefs: Online fsck can now fix errors BCH_FS_fsck_done -> BCH_FS_fsck_running; set when we might be fixing fsck errors. Also; set fix_errors to ask by default when fsck is running. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-05 23:24:20 -05:00
Kent Overstreet	1f34c21bc6	bcachefs: bch2_trigger_pointer() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-05 23:24:20 -05:00
Kent Overstreet	e4eb3e5ae4	bcachefs: unify stripe trigger Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-05 23:24:20 -05:00
Kent Overstreet	f4f78779bb	bcachefs: move stripe triggers to ec.c Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-05 23:24:20 -05:00
Kent Overstreet	153d1c63c2	bcachefs: unify alloc trigger Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-05 23:24:20 -05:00
Kent Overstreet	6820ac2cdc	bcachefs: move bch2_mark_alloc() to alloc_background.c Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-05 23:24:20 -05:00
Kent Overstreet	6cacd0c414	bcachefs: unify reservation trigger Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-05 23:24:20 -05:00
Kent Overstreet	7bc4d18af4	bcachefs: unify reflink_p trigger Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-05 23:24:20 -05:00
Kent Overstreet	08bc959010	bcachefs: unify inode trigger Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-05 23:24:19 -05:00
Kent Overstreet	282e7c37eb	bcachefs: kill mem_trigger_run_overwrite_then_insert() now that type signatures are unified, redundant Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-05 23:24:19 -05:00
Kent Overstreet	c95e9ec486	bcachefs: BTREE_TRIGGER_TRANSACTIONAL New flag so that triggers can distinguish whether we're running transactional or atomic triggers (or gc) - unifying the callbacks. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-05 23:24:19 -05:00
Kent Overstreet	089e311347	bcachefs: Kill BTREE_TRIGGER_NOATOMIC dead code Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-05 23:24:19 -05:00
Kent Overstreet	ad00bce07d	bcachefs: mark now takes bkey_s Prep work for disk space accounting rewrite: we're going to want to use a single callback for both of our current triggers, so we need to change them to have the same type signature first. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-05 23:24:19 -05:00
Kent Overstreet	717296c34c	bcachefs: trans_mark now takes bkey_s Prep work for disk space accounting rewrite: we're going to want to use a single callback for both of our current triggers, so we need to change them to have the same type signature first. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-05 23:24:19 -05:00
Kent Overstreet	eff1f728be	bcachefs: Upgrading uses bch_sb.recovery_passes_required Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-05 23:24:19 -05:00
Kent Overstreet	96f37eabe7	bcachefs: factor out thread_with_file, thread_with_stdio thread_with_stdio now knows how to handle input - fsck can now prompt to fix errors. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-05 23:24:19 -05:00
Kent Overstreet	f60250de32	bcachefs: Fix printing of device durability BCH_MEMBER_DURABILITY() was not present initially; a value of 0 means use the default, nonzero means use v - 1. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-05 23:24:19 -05:00
Kent Overstreet	8feaebb0ae	bcachefs: __bch2_journal_key_to_wb -> bch2_journal_key_to_wb_slowpath Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-05 23:24:19 -05:00
Kent Overstreet	f412392f6e	bcachefs: __journal_keys_sort() refactoring Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-05 23:24:19 -05:00
Kent Overstreet	371650143d	bcachefs: wb_key_cmp -> wb_key_ref_cmp Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-05 23:24:19 -05:00
Kent Overstreet	89056f245b	bcachefs: track transaction durations Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-05 23:24:19 -05:00
Kent Overstreet	83322e8ca8	bcachefs: btree_trans always has stats reserve slot 0 for unknown (when we overflow), to avoid some branches Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-05 23:24:19 -05:00
Kent Overstreet	0d529663f0	bcachefs: Split brain detection Use the new bch_member->seq, sb->write_time fields to detect split brain and kick out devices when necessary. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-05 23:24:19 -05:00
Kent Overstreet	6b00de06f5	bcachefs: bch_member->seq Add new fields for split brain detection: - bch_member->seq, which tracks the sequence number of the last superblock write that happened to each member device - bch_sb->write_time, which tracks the time of the last superblock write, to allow detection of when two members have diverged but had the same number of superblock writes. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-05 23:24:19 -05:00
Kent Overstreet	62719cf33c	bcachefs: Fix nochanges/read_only interaction nochanges means "we cannot issue writes at all"; it's possible to go into a pseudo read-write mode where we pin dirty metadata in memory, which is used for fsck in dry run mode and doing journal replay on a read only mount, but we do not want to allow an actual read-write mount in nochanges mode. But we do always want to allow early read-write, during recovery - this patch clarifies that. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-05 23:24:19 -05:00
Kent Overstreet	5e32914514	bcachefs: Check journal entries for invalid keys in trans commit path Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-05 23:24:19 -05:00
Kent Overstreet	c98d132ed1	bcachefs: check_directory_structure() can now be run online Now that we have dynamically resizable btree paths, check_directory_structure() can check one path - inode up to the root - in a single transaction. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:44 -05:00
Kent Overstreet	d296e7b185	bcachefs: Fix reattach_inode() for snapshots reattach_inode() was broken w.r.t. snapshots - we'd lookup the subvolume to look up lost+found, but if we're in an interior node snapshot that didn't make any sense. Instead, this adds a dirent path for creating in a specific snapshot, skipping the subvolume; and we also make sure to create lost+found in the root snapshot, to avoid conflicts with lost+found being created in overlapping snapshots. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:44 -05:00
Kent Overstreet	c558c577cb	bcachefs: bch2_btree_trans_peek_slot_updates refactoring the BTREE_ITER_WITH_UPDATES code, prep for removing the flag and making it always-on Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:44 -05:00
Kent Overstreet	359e89add5	bcachefs: bch2_btree_trans_peek_prev_updates bch2_btree_iter_peek_prev() now supports BTREE_ITER_WITH_UPDATES Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:44 -05:00
Kent Overstreet	eb6863598a	bcachefs: bch2_btree_trans_peek_updates refactoring the BTREE_ITER_WITH_UPDATES code, prep for removing the flag and making it always-on Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:44 -05:00
Kent Overstreet	0c99e17d3b	bcachefs: growable btree_paths XXX: we're allocating memory with btree locks held - bad We need to plumb through an error path so we can do allocate_dropping_locks() - but we're merging this now because it fixes a transaction path overflow caused by indirect extent fragmentation, and the resize path is rare. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:44 -05:00
Kent Overstreet	ff70ad2c8d	bcachefs: Fix interior update path btree_path uses Since the btree_paths array is now about to become growable, we have to be careful not to refer to paths by pointer across contexts where they may be reallocated. This fixes the remaining btree_interior_update() paths - split and merge. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:44 -05:00
Kent Overstreet	2c3b0fc3bd	bcachefs: trans->nr_paths Start to plumb through dynamically growable btree_paths; this patch replaces most BTREE_ITER_MAX references with trans->nr_paths. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:44 -05:00
Kent Overstreet	5cc6daf749	bcachefs: trans->updates will also be resizable the reflink triggers are also bumping up against the maximum number of paths in a transaction - and generating proportional numbers of updates. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:44 -05:00
Kent Overstreet	31403dca5b	bcachefs: optimize __bch2_trans_get(), kill DEBUG_TRANSACTIONS - Some tweaks to greatly reduce locking overhead for the list of btree transactions, so that it can always be enabled: leave btree_trans objects on the list when they're on the percpu single item freelist, and only check for duplicates in the same process when CONFIG_BCACHEFS_DEBUG is enabled - don't zero out the full btree_trans() unless we allocated it from the mempool Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:44 -05:00
Kent Overstreet	fea153a845	bcachefs: rcu protect trans->paths Upcoming patches are going to be changing trans->paths to a reallocatable buffer. We need to guard against use after free when it's used by other threads; this introduces RCU protection to those paths and changes them to check for trans->paths == NULL Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:44 -05:00
Kent Overstreet	6474b70610	bcachefs: Clean up btree_trans Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:44 -05:00
Kent Overstreet	398c98347d	bcachefs: kill btree_path.idx Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:44 -05:00
Kent Overstreet	d7e14035a4	bcachefs: get_unlocked_mut_path() -> btree_path_idx_t Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:44 -05:00
Kent Overstreet	542e639674	bcachefs: bch2_btree_iter_peek_prev() no longer uses path->idx Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:44 -05:00
Kent Overstreet	566eabd36f	bcachefs: bch2_path_get() no longer uses path->idx Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:44 -05:00
Kent Overstreet	b0b6737822	bcachefs: trans_for_each_path_with_node() no longer uses path->idx Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:43 -05:00
Kent Overstreet	ccb7b08fbb	bcachefs: trans_for_each_path() no longer uses path->idx path->idx is now a code smell: we should be using path_idx_t, since it's stable across btree path reallocation. This is also a bit faster, using the same loop counter vs. fetching path->idx from each path we iterate over. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:43 -05:00
Kent Overstreet	4c5289e632	bcachefs: kill trans_for_each_path_from() dead code Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:43 -05:00
Kent Overstreet	311e446a41	bcachefs: bch2_btree_path_to_text() -> btree_path_idx_t Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:43 -05:00
Kent Overstreet	1f75ba4e65	bcachefs: struct trans_for_each_path_inorder_iter reducing our usage of path->idx Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:43 -05:00
Kent Overstreet	7f9821a7c1	bcachefs: btree_insert_entry -> btree_path_idx_t Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:43 -05:00
Kent Overstreet	07f383c71f	bcachefs: btree_iter -> btree_path_idx_t Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:43 -05:00
Kent Overstreet	788cc25d15	bcachefs: btree_path_alloc() -> btree_path_idx_t Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:43 -05:00
Kent Overstreet	96ed47d130	bcachefs: bch2_btree_path_traverse() -> btree_path_idx_t Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:43 -05:00
Kent Overstreet	f6363acaa6	bcachefs: bch2_btree_path_make_mut() -> btree_path_idx_t Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:43 -05:00
Kent Overstreet	4617d94617	bcachefs: bch2_btree_path_set_pos() -> btree_path_idx_t Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:43 -05:00
Kent Overstreet	74e600c19a	bcachefs; bch2_path_put() -> btree_path_idx_t Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:43 -05:00
Kent Overstreet	255ebbbf75	bcachefs: bch2_path_get() -> btree_path_idx_t Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:43 -05:00
Kent Overstreet	5ce8b92da0	bcachefs: minor bch2_btree_path_set_pos() optimization bpos_eq() is cheaper than bpos_cmp() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:43 -05:00
Kent Overstreet	4753bdeb26	bcachefs: Kill GFP_NOFAIL usage in readahead path Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:43 -05:00
Kent Overstreet	806ebf2aa0	bcachefs: Convert split_devs() to darray Bit of cleanup & modernization: also moving this code to util.c, it'll be used by userspace as well. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:43 -05:00
Kent Overstreet	0c0ba8e9c5	bcachefs: skip journal more often in key cache reclaim Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:43 -05:00
Kent Overstreet	1a2a9f9f53	bcachefs: for_each_keylist_key() declares loop iter Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:43 -05:00
Kent Overstreet	0beebd9245	bcachefs: bkey_for_each_ptr() now declares loop iter Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:43 -05:00
Kent Overstreet	0bc64d7e26	bcachefs: kill __bch2_btree_iter_peek_upto_and_restart() dead code Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:43 -05:00
Kent Overstreet	4eb3877eae	bcachefs: fsck -> bch2_trans_run() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:43 -05:00
Kent Overstreet	cea07a7b6a	bcachefs: vstruct_for_each() now declares loop iter Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:42 -05:00
Kent Overstreet	41b84fb489	bcachefs: for_each_member_device_rcu() now declares loop iter Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:42 -05:00
Kent Overstreet	9fea2274f7	bcachefs: for_each_member_device() now declares loop iter Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:42 -05:00
Kent Overstreet	80eab7a7c2	bcachefs: for_each_btree_key() now declares loop iter Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:42 -05:00
Kent Overstreet	c47e8bfbb7	bcachefs: kill for_each_btree_key_norestart() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:42 -05:00
Kent Overstreet	44ddd8ad1e	bcachefs: kill for_each_btree_key_old_upto() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:42 -05:00
Kent Overstreet	3a860b5ad5	bcachefs: for_each_btree_key_upto() -> for_each_btree_key_old_upto() And for_each_btree_key2_upto -> for_each_btree_key_upto Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:42 -05:00
Kent Overstreet	c8ef2dc2fc	bcachefs: bch2_dirent_lookup() -> lockrestart_do() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:42 -05:00
Kent Overstreet	79904fa2bb	bcachefs: bch2_trans_srcu_lock() should be static Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:42 -05:00
Kent Overstreet	6d5c606c1c	bcachefs: use track_event_change() for allocator blocked stats Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:42 -05:00
Kent Overstreet	ef23397c30	bcachefs: fix warning about uninitialized time_stats Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:42 -05:00
Kent Overstreet	e34ec13a56	bcachefs: add more verbose logging Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:42 -05:00
Kent Overstreet	53b67d8dcf	bcachefs: better error message in btree_node_write_work() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:42 -05:00
Kent Overstreet	037a2d9f48	bcachefs: simplify bch_devs_list Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:42 -05:00
Kent Overstreet	defd9e39b5	bcachefs: darray_for_each() now declares loop iter Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:42 -05:00
Kent Overstreet	559e6c2336	bcachefs: trans_for_each_update() now declares loop iter Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:42 -05:00
Kent Overstreet	cee0a8ea6d	bcachefs: Improve the nopromote tracepoint Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:42 -05:00
Kent Overstreet	1ad36a010c	bcachefs: Use GFP_KERNEL for promote allocations We already have btree locks dropped here - no need for GFP_NOFS. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:42 -05:00
Randy Dunlap	920388254f	bcachefs: mean and variance: fix kernel-doc for function params Add missing function parameter descriptions in mean_and_variance.c. The also eliminates the "Excess function parameter" warnings. Prevents these kernel-doc warnings: mean_and_variance.c:67: warning: Function parameter or member 's' not described in 'mean_and_variance_get_mean' mean_and_variance.c:78: warning: Function parameter or member 's1' not described in 'mean_and_variance_get_variance' mean_and_variance.c:94: warning: Function parameter or member 's' not described in 'mean_and_variance_get_stddev' mean_and_variance.c:108: warning: Function parameter or member 's' not described in 'mean_and_variance_weighted_update' mean_and_variance.c:108: warning: Function parameter or member 'x' not described in 'mean_and_variance_weighted_update' mean_and_variance.c:108: warning: Excess function parameter 's1' description in 'mean_and_variance_weighted_update' mean_and_variance.c:108: warning: Excess function parameter 's2' description in 'mean_and_variance_weighted_update' mean_and_variance.c:134: warning: Function parameter or member 's' not described in 'mean_and_variance_weighted_get_mean' mean_and_variance.c:143: warning: Function parameter or member 's' not described in 'mean_and_variance_weighted_get_variance' mean_and_variance.c:153: warning: Function parameter or member 's' not described in 'mean_and_variance_weighted_get_stddev' Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Brian Foster <bfoster@redhat.com> Cc: linux-bcachefs@vger.kernel.org Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:42 -05:00
Kent Overstreet	447c1c0105	bcachefs: check for failure to downgrade With the upcoming member seq patch, it's now critical that we don't ever write to a superblock that hasn't been version downgraded - failure to update member seq fields will cause split brain detection to fire erroniously. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:42 -05:00
Kent Overstreet	44fd13a4c6	bcachefs: Fixes for rust bindgen bindgen doesn't seem to like u128 or DECLARE_FLEX_ARRAY(), but we can hack around them. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:42 -05:00
Kent Overstreet	023f9ac9f7	bcachefs: Delete dio read alignment check We'll typically fomat devices with the physical blocksize supported, but the logical blocksize will be smaller. There's no real need to be checking the blocksize at the filesystem level, anyways - the block layer has to check this anyways. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:42 -05:00
Brian Foster	d8d819580a	bcachefs: clean up some dead fallocate code The have_reservation local variable in bch2_extent_fallocate() is initialized to false and set to true further down in the function. Between this two points, one branch of code checks for negative value and one for positive, and nothing ever checks the variable after it is set to true. Clean up some of the unnecessary logic and code. Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:41 -05:00
Kent Overstreet	a7dc10ce68	bcachefs: Make sure allocation failure errors are logged The previous patch fixed a bug in allocation path error handling, and it would've been noticed sooner had it been logged properly. Generally speaking, errors that shouldn't happen in normal operation and are being returned up the stack should be logged: the write path was already logging IO errors, but non IO errors were missed. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:41 -05:00
Kent Overstreet	548673f8d3	bcachefs: drop extra semicolon Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:41 -05:00
Gustavo A. R. Silva	4c26dea1c0	bcachefs: Replace zero-length array with flex-array member and use __counted_by Fake flexible arrays (zero-length and one-element arrays) are deprecated, and should be replaced by flexible-array members. So, replace zero-length array with a flexible-array member in `struct bch_ioctl_fsck_offline`. Also annotate array `devs` with `__counted_by()` to prepare for the coming implementation by GCC and Clang of the `__counted_by` attribute. Flexible array members annotated with `__counted_by` can have their accesses bounds-checked at run-time via `CONFIG_UBSAN_BOUNDS` (for array indexing) and `CONFIG_FORTIFY_SOURCE` (for strcpy/memcpy-family functions). This fixes the following -Warray-bounds warnings: fs/bcachefs/chardev.c: In function 'bch2_ioctl_fsck_offline': fs/bcachefs/chardev.c:363:34: warning: array subscript 0 is outside array bounds of '__u64[0]' {aka 'long long unsigned int[]'} [-Warray-bounds=] 363 \| if (copy_from_user(devs, &user_arg->devs[0], sizeof(user_arg->devs[0]) * arg.nr_devs)) { \| ^~~~~~~~~~~~~~~~~~ In file included from fs/bcachefs/chardev.c:5: fs/bcachefs/bcachefs_ioctl.h:400:33: note: while referencing 'devs' 400 \| __u64 devs[0]; This results in no differences in binary output. Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:41 -05:00
Gustavo A. R. Silva	ac19c4c3d0	bcachefs: Use array_size() in call to copy_from_user() Use array_size() helper, instead of the open-coded version in call to copy_from_user(). Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:41 -05:00
Kent Overstreet	038fecc045	bcachefs: qstr_eq() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:41 -05:00
Kent Overstreet	cf904c8d96	bcachefs: bch_err_(fn\|msg) check if should print Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:41 -05:00
Kent Overstreet	e06af20719	bcachefs: fix userspace build errors Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:41 -05:00
Kent Overstreet	73ffa53056	bcachefs: Drop journal entry compaction Previously, we dropped empty journal entries and coalesced entries that could be - but it's not worth the overhead; we very rarely leave unused journal entries after getting a journal reservation. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:41 -05:00
Kent Overstreet	679972348d	bcachefs: kill btree_trans->wb_updates the btree write buffer path now creates a journal entry directly Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:41 -05:00
Kent Overstreet	002c76dcf6	bcachefs: check_root() can now be run online check_root() is simple enough to run as one single transaction, so is trivial to run online. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:41 -05:00
Kent Overstreet	38ced43bb0	bcachefs: Inline btree write buffer sort The sort in the btree write buffer flush path is a very hot path, and it's particularly performance sensitive since it's single threaded and can block every other thread on a multithreaded write workload. It's well worth doing a sort with inlined cmp and swap functions. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:41 -05:00
Kent Overstreet	09caeabe1a	bcachefs: btree write buffer now slurps keys from journal Previosuly, the transaction commit path would have to add keys to the btree write buffer as a separate operation, requiring additional global synchronization. This patch introduces a new journal entry type, which indicates that the keys need to be copied into the btree write buffer prior to being written out. We switch the journal entry type back to JSET_ENTRY_btree_keys prior to write, so this is not an on disk format change. Flushing the btree write buffer may require pulling keys out of journal entries yet to be written, and quiescing outstanding journal reservations; we previously added journal->buf_lock for synchronization with the journal write path. We also can't put strict bounds on the number of keys in the journal destined for the write buffer, which means we might overflow the size of the preallocated buffer and have to reallocate - this introduces a potentially fatal memory allocation failure. This is something we'll have to watch for, if it becomes an issue in practice we can do additional mitigation. The transaction commit path no longer has to explicitly check if the write buffer is full and wait on flushing; this is another performance optimization. Instead, when the btree write buffer is close to full we change the journal watermark, so that only reservations for journal reclaim are allowed. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:41 -05:00
Kent Overstreet	b05c0e9370	bcachefs: journal->buf_lock Add a new lock for synchronizing between journal IO path and btree write buffer flush. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:41 -05:00
Kent Overstreet	0ba9375a11	bcachefs: Unwritten journal buffers are always dirty Ensure that journal bufs that haven't been written can't be reclaimed from the journal pin fifo, and can thus have new pins taken. Prep work for changing the btree write buffer to pull keys from the journal directly. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:41 -05:00
Kent Overstreet	f33600057f	bcachefs: bch2_trans_node_add no longer uses trans_for_each_path() In the future we'll be making trans->paths resizable and potentially having _many_ more paths (for fsck); we need to start fixing algorithms that walk each path in a transaction where possible. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:41 -05:00
Kent Overstreet	24de63dacb	bcachefs: Improve trans->extra_journal_entries Instead of using a darray, we now allocate journal entries for the transaction commit path with our normal bump allocator - with an inlined fastpath, and using btree_transaction_stats to remember how much to initially allocate so as to avoid transaction restarts. This is prep work for converting write buffer updates to use this mechanism. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:41 -05:00
Kent Overstreet	e4e49375a8	bcachefs; kill bch2_btree_key_cache_flush() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:41 -05:00
Kent Overstreet	a83b6c895c	bcachefs: kill btree_path->(alloc_seq\|downgrade_seq) These were for extra info in tracepoints for debugging a specialized issue - we do not want to bloat btree_path for this, at least in release builds. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:41 -05:00
Kent Overstreet	249bf593e8	bcachefs: Fix snapshot.c assertion for online fsck c->curr_recovery_pass can go backwards; this adds a non rewinding version, c->recovery_pass_done. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:41 -05:00
Randy Dunlap	b56cee70e7	bcachefs: six lock: fix typos Fix a few typos in the six.h header file. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Brian Foster <bfoster@redhat.com> Cc: linux-bcachefs@vger.kernel.org Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:40 -05:00
Kent Overstreet	f8fd5871be	bcachefs: reserve path idx 0 for sentinal Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:40 -05:00
Kent Overstreet	5028b9078c	bcachefs: Rename for_each_btree_key2() -> for_each_btree_key() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:40 -05:00
Kent Overstreet	27b2df982f	bcachefs: Kill for_each_btree_key() for_each_btree_key() handles transaction restarts, like for_each_btree_key2(), but only calls bch2_trans_begin() after a transaction restart - for_each_btree_key2() wraps every loop iteration in a transaction. The for_each_btree_key() behaviour is problematic when it leads to holding the SRCU lock that prevents key cache reclaim for an unbounded amount of time - there's no real need to keep it around. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:40 -05:00
Kent Overstreet	8c066edeb4	bcachefs: continue now works in for_each_btree_key2() continue now works as in any other loop Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:40 -05:00
Kent Overstreet	be1fa63de8	bcachefs: Fix bch2_read_btree() In the debugfs code, we had an incorrect use of drop_locks_do(); on transaction restart we don't want to restart the current loop iteration, since we've already emitted the current key to the buffer for userspace. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:40 -05:00
Kent Overstreet	a0acc24fed	bcachefs: Fix open coded set_btree_iter_dontneed() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:40 -05:00
Kent Overstreet	267b801fda	bcachefs: BCH_IOCTL_FSCK_ONLINE This adds a new ioctl for running fsck on a mounted, in use filesystem. This reuses the fsck_thread code from the previous patch for running fsck on an offline, unmounted filesystem, so that log messages for the fsck thread are redirected to userspace. Only one running fsck instance is allowed at a time; a new semaphore (since the lock will be taken by one thread and released by another) is added for this. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:40 -05:00
Kent Overstreet	8408fa570e	bcachefs: BCH_IOCTL_FSCK_OFFLINE This adds a new ioctl for running fsck on a list of devices. Normally, if we wish to use the kernel's implementation of fsck we'd run it at mount time with -o fsck. This ioctl lets us run fsck without mounting, so that userspace bcachefs-tools can transparently switch to the kernel's implementation of fsck when appropriate - primarily if the kernel version of bcachefs better matches the filesystem on disk. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:40 -05:00
Kent Overstreet	7f391b2f8e	bcachefs: bch2_run_online_recovery_passes() Add a new helper for running online recovery passes - i.e. online fsck. This is a subset of our normal recovery passes, and does not - for now - use or follow c->curr_recovery_pass. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:40 -05:00
Kent Overstreet	0953450af7	bcachefs: Mark recovery passses that are safe to run online Online fsck is coming, and many of our recovery/fsck passes are already safe to run while the filesystem is in use - mark which ones. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:40 -05:00
Kent Overstreet	2b41226d7f	bcachefs: Add ability to redirect log output Upcoming patches are going to add two new ioctls for running fsck in the kernel, but pretending that we're running our normal userspace fsck. This patch adds some plumbing for redirecting our normal log messages away from the dmesg log to a thread_with_file file descriptor - via a struct log_output, which will be consumed by the fsck f_op's read method. The new ioctls will allow for running fsck in the kernel against an offline filesystem (without mounting it), and an online filesystem. For an offline filesystem we need a way to pass in a pointer to the log_output, which is done via a new hidden opts.h option. For online fsck, we can set c->output directly, but only want to redirect log messages from the thread running fsck - hence the new c->output_filter method. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:40 -05:00
Kent Overstreet	bbefcd910d	bcachefs: thread_with_file Abstract out a new helper from the data job code, for connecting a kthread to a file descriptor. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:40 -05:00
Kent Overstreet	63508b7564	bcachefs: c->ro_ref Add a new refcount for async ops that don't necessarily need the fs to be RW, with similar lifetime/rules otherwise as c->writes. To be used by online fsck. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:40 -05:00
Kent Overstreet	483dea4431	bcachefs: Improve error message when finding wrong btree node single_device.merge_torture_flakey is, very rarely, finding a btree node that doesn't match the key that points to it: this patch improves the error message to print out more fields from the btree node header, so that we can see what else does or does not match the key. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:40 -05:00
Brian Foster	5a11b5fe79	bcachefs: return from fsync on writeback error to avoid early shutdown When investigating transient failures of generic/441 on bcachefs, it was determined that the cause of the failure was a combination of unconditional emergency shutdown and racing between background journal activity and the test switchover from a working device mapper table to an error injecting table. Part of the reason for this sequence of events is that bcachefs aggressively flushes as much as possible during fsync(), regardless of errors. While this is reasonable behavior, it is technically unnecessary because once an error is returned from fsync(), the caller cannot make any assumptions about the resilience of data. Tweak the bch2_fsync() logic to return an error on failure of any of the steps involved in the flush. Note that this change alone does not prevent generic/441 failure, but in combination with a test tweak to avoid racing during the dm-error table switchover it avoids the unnecessary shutdowns and allows the test to pass reliably on bcachefs. Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:40 -05:00
Kent Overstreet	56ec287d30	bcachefs: BCH_ERR_opt_parse_error Continuing the project of replacing generic error codes with more specific ones. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:40 -05:00
Kent Overstreet	6e92d15546	bcachefs: Refactor trans->paths_allocated to be standard bitmap Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:40 -05:00
Kent Overstreet	0d963a635d	bcachefs: Move reflink_p triggers into reflink.c Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:40 -05:00
Richard Davies	d4e4d8b98b	bcachefs: Remove obsolete comment about zstd Remove obsolete comment about zstd, since approach changed during development of commit `bbc3a46065` Signed-off-by: Richard Davies <richard@arachsys.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:40 -05:00
Kent Overstreet	a564c9fad5	bcachefs: Include btree_trans in more tracepoints This gives us more context information - e.g. which codepath is invoking btree node reads. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:40 -05:00
Brian Foster	d9e14a4eb9	bcachefs: remove sb lock and flags update on explicit shutdown bcachefs grabs s_umount and sets SB_RDONLY when the fs is shutdown via the ioctl() interface. This has a couple issues related to interactions between shutdown and freeze: 1. The flags == FSOP_GOING_FLAGS_DEFAULT case is a deadlock vector because freeze_bdev() calls into freeze_super(), which also acquires s_umount. 2. If an explicit shutdown occurs while the sb is frozen, SB_RDONLY alters the thaw path as if the sb was read-only at freeze time. This effectively leaks the frozen state and leaves the sb frozen indefinitely. The usage of SB_RDONLY here goes back to the initial bcachefs commit and AFAICT is simply historical behavior. This behavior is unique to bcachefs relative to the handful of other filesystems that support the shutdown ioctl(). Typically, SB_RDONLY is reserved for the proper remount path, which itself is restricted from modifying frozen superblocks in reconfigure_super(). Drop the unnecessary sb lock and flags update bch2_ioc_goingdown() to address both of these issues. Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:40 -05:00
Kent Overstreet	a56c61714a	bcachefs: Make backpointer fsck wb flush check more rigorous backpointers fsck now always runs in rw mode - the btree is being modified while it runs, by e.g. copygc, rebalance, the discard worker, the invalidate worker. We could find a missing backpointer, flush the btree write buffer, and then on the next iteration find a new key at the exact same position - which will most likely need another write buffer flush. Hence, we have to check for an exact match on last_flushed, not just the pos. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:39 -05:00
Kent Overstreet	0f64a6daaa	bcachefs: On missing backpointer to interior node, flush interior updates Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:39 -05:00
Daniel Hill	21e07cc966	bcachefs: remove redundant condition from data_update_index_update Signed-off-by: Daniel Hill <daniel@gluo.nz> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:39 -05:00
Daniel Hill	a79e1b6dea	bcachefs: copygc shouldn't try moving buckets on error Co-developed-by: Kent Overstreet <kent.overstreet@linux.dev> Signed-off-by: Daniel Hill <daniel@gluo.nz> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:39 -05:00
Kent Overstreet	3f0e297d86	bcachefs: Explicity go RW for fsck This eliminates a lot of BCH_TRANS_COMMIT_lazy_rw flags, and is less error prone. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:39 -05:00
Daniel Hill	3ec3758a81	bcachefs: copygc should wakeup on shutdown if disabled Signed-off-by: Daniel Hill <daniel@gluo.nz> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:39 -05:00
Daniel Hill	0c069781dd	bcachefs: rebalance should wakeup on shutdown if disabled Signed-off-by: Daniel Hill <daniel@gluo.nz> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:39 -05:00
Daniel Hill	7452933880	bcachefs: remove dead bch2_evacuate_bucket() Signed-off-by: Daniel Hill <daniel@gluo.nz> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:39 -05:00
Gustavo A. R. Silva	62286a08c3	bcachefs: Replace zero-length arrays with flexible-array members Fake flexible arrays (zero-length and one-element arrays) are deprecated, and should be replaced by flexible-array members. So, replace zero-length arrays with flexible-array members in multiple structures. Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:39 -05:00
Kent Overstreet	8a4b4c52c0	bcachefs: more write buffer refactoring prep work for big rewrite - no functional changes in this patch. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:39 -05:00
Kent Overstreet	ab4fb4b678	bcachefs: wb_flush_one_slowpath() A bit of refactoring for better inlining in the main btree write buffer flush path. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:39 -05:00
Kent Overstreet	48dade8176	bcachefs: ONLY_SPECIFIED_DEVS doesn't mean ignore durability anymore Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:39 -05:00
Kent Overstreet	a276132c2d	bcachefs: Don't open code bch2_dev_exists2() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:39 -05:00
Kent Overstreet	3398124444	bcachefs: Improve trace_trans_restart_would_deadlock In the CI, we're seeing tests failing due to excessive would_deadlock transaction restarts - the tracepoint now includes the lock cycle that occured. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:39 -05:00
Kent Overstreet	e153a0d70b	bcachefs: Improve trace_trans_restart_too_many_iters() We now include the list of paths in use. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:39 -05:00
Kent Overstreet	7464403009	bcachefs: count_event() Small helper for event counters. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:39 -05:00
Kent Overstreet	cb13f47139	bcachefs: bch2_btree_write_buffer_flush() -> bch2_btree_write_buffer_tryflush() More accurate naming. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:39 -05:00
Kent Overstreet	d3083cf28d	bcachefs: bch2_btree_write_buffer_flush_locked() Minor refactoring - improved naming, and move the responsibility for flush_lock to the caller instead of having it be shared. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:39 -05:00
Kent Overstreet	183bcc89b8	bcachefs: Clean up btree write buffer write ref handling __bch2_btree_write_buffer_flush() now assumes a write ref is already held (as called by the transaction commit path); and the wrappers bch2_write_buffer_flush() and flush_sync() take an explicit write ref. This means internally the write buffer code can always use BTREE_INSERT_NOCHECK_RW, instead of in the previous code passing flags around and hoping the NOCHECK_RW flag was always carried around correctly. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:39 -05:00
Kent Overstreet	cf5bacb6a5	bcachefs: delete useless commit_do() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:39 -05:00
Kent Overstreet	8ab3fa9639	bcachefs: kill journal->preres_wait Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:39 -05:00
Kent Overstreet	56db242951	bcachefs: Improve btree write buffer tracepoints - add a tracepoint for write_buffer_flush_sync; this is expensive - fix the write_buffer_flush_slowpath tracepoint Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:39 -05:00
Kent Overstreet	c259bd95d1	bcachefs: No need to allocate keys for write buffer Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:38 -05:00
Kent Overstreet	3c471b6588	bcachefs: convert bch_fs_flags to x-macro Now we can print out filesystem flags in sysfs, useful for debugging various "what's my filesystem doing" issues. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:38 -05:00
Kent Overstreet	9e243d3cda	bcachefs: Kill journal_seq/gc args to bch2_dev_usage_update_m() This is only used by gc (fsck). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:38 -05:00
Kent Overstreet	3f59547e22	bcachefs: Refactor bch2_check_alloc_to_lru_ref() This code was somewhat convoluted - because originally bch2_lru_set() could modify the LRU index if there was a collision. That's no longer the case, so the "create LRU entry" path has no reason to update the alloc key, so we can separate the handling of the two fsck errors. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:38 -05:00
Kent Overstreet	25d1e39df0	bcachefs: Add a rebalance, data_update tracepoints Add a tracepoint for rebalance, printing out - the target option - the compression option - the key being rebalanced Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:38 -05:00
Kent Overstreet	d05db12715	bcachefs: Print durability in member_to_text() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:38 -05:00
Kent Overstreet	7541787f58	bcachefs: Improve sysfs compression_stats Break it out by compression type, and include average extent size. Also, format into a nice table. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:38 -05:00
Kent Overstreet	9b34f02cdc	bcachefs: Kill dev_usage->buckets_ec This counter is redundant; it's simply the sum of BCH_DATA_stripe and BCH_DATA_parity buckets. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:38 -05:00
Kent Overstreet	ed0cd515cd	bcachefs: bch2_dev_usage_to_text() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:38 -05:00
Kent Overstreet	dafff7e575	bcachefs: New bucket sector count helpers This introduces bch2_bucket_sectors() and bch2_bucket_sectors_dirty(), prep work for separately accounting stripe sectors. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:38 -05:00
Kent Overstreet	e6674decb2	bcachefs: BCH_IOCTL_DEV_USAGE_V2 BCH_IOCTL_DEV_USAGE mistakenly put the per-data-type array in struct bch_ioctl_dev_usage; since ioctl numbers encode the size of the arg, that means adding new data types breaks the ioctl. This adds a new version that includes the number of data types as a parameter: the old version is fixed at 10 so as to not break when adding new types. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:38 -05:00
Kent Overstreet	3b05b8e082	bcachefs: Simplify check_bucket_ref() We only need the sector count being modified. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:38 -05:00
Kent Overstreet	011173321f	bcachefs: six locks: Simplify optimistic spinning osq lock maintainers don't want it to be used outside of kernel/locking/ - but, we can do better. Since we have lock handoff signalled via waitlist entries, there's no reason for optimistic spinning to have to look at the lock at all - aside from checking lock-owner; we can just spin looking at our waitlist entry. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:38 -05:00
Kent Overstreet	ba11c7d67a	bcachefs: BCH_DATA_OP_drop_extra_replicas Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:38 -05:00
Kent Overstreet	3c843a6759	bcachefs: Convert bch2_move_btree() to bbpos Minor cleanup. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:38 -05:00
Kent Overstreet	01e9564540	bcachefs: x-macro-ify bch_data_ops enum This will let us add an enum -> string table for a to_text() fn. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:38 -05:00
Yang Li	225879f403	bcachefs: clean up one inconsistent indenting fs/bcachefs/journal_io.c:1843 bch2_journal_write_pick_flush() warn: inconsistent indenting Reported-by: Abaci Robot <abaci@linux.alibaba.com> Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=7585 Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:38 -05:00
Daniel Hill	2b161cc7cb	bcachefs: add a quieter bch2_read_super If we're looking for a bcachefs supers iteratively we don't want to see this error. This function replaces KERN_ERR with KERN_INFO for when we don't find a bcachefs superblock but preserves other errors. Signed-off-by: Daniel Hill <daniel@gluo.nz> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:38 -05:00
Kent Overstreet	25f64e997e	bcachefs: Don't use update_cached_sectors() in bch2_mark_alloc() bch2_update_cached_sectors_list() is closer to how the new disk space accounting works, called from trans_mark(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:38 -05:00
Kent Overstreet	086a52f7fa	bcachefs: Rename bch_replicas_entry -> bch_replicas_entry_v1 Prep work for introducing bch_replicas_entry_v2 Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:38 -05:00
Kent Overstreet	ad9c7992eb	bcachefs: Kill btree_iter->journal_pos For BTREE_ITER_WITH_JOURNAL, we memoize lookups in the journal keys, to avoid the binary search overhead. Previously we stashed the pos of the last key returned from the journal, in order to force the lookup to be redone when rewinding. Now bch2_journal_keys_peek_upto() handles rewinding itself when necessary - so we can slim down btree_iter. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:37 -05:00
Kent Overstreet	1ae8a0904a	bcachefs: Kill memset() in bch2_btree_iter_init() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:37 -05:00
Kent Overstreet	ae0e61175e	bcachefs: Add a tracepoint for journal entry close Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:37 -05:00
Kent Overstreet	b27d7afb79	bcachefs: Don't flush journal after replay The flush_all_pins() after journal replay was unecessary, and trying to completely flush the journal while RW is not a great idea - it's not guaranteed to terminate if other threads keep adding things to the jorunal. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:37 -05:00
Kent Overstreet	b4b79b0764	bcachefs: Don't rejournal keys in key cache flush Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:37 -05:00
Kent Overstreet	5fd24caf57	bcachefs: Fix userspace bch2_prt_datetime() ctime_r() outputs a newline, which we don't want. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:37 -05:00
Kent Overstreet	e56978c80d	bcachefs: Kill BTREE_ITER_ALL_LEVELS As discussed in the previous patch, BTREE_ITER_ALL_LEVELS appears to be racy with concurrent interior node updates - and perhaps it is fixable, but it's tricky and unnecessary. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:37 -05:00
Kent Overstreet	cd404e5b05	bcachefs: backpointers fsck no longer uses BTREE_ITER_ALL_LEVELS It appears that BTREE_ITER_ALL_LEVELS is racy with concurrent interior node btree updates; unfortunate but not terribly surprising it's a difficult problem - that was the original reason for gc_lock. BTREE_ITER_ALL_LEVELS will probably be deleted in a subsequent patch, this changes backpointers fsck to instead walk keys at one level of the btree at a time. This fixes the tiering_drop_alloc test, which stopped working with the patch to not flush the journal after journal replay. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:37 -05:00
Kent Overstreet	eb54e81f27	bcachefs: Improve btree_path_dowgrade tracepoint Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:37 -05:00
Kent Overstreet	cb52d23e77	bcachefs: Rename BTREE_INSERT flags BTREE_INSERT flags are actually transaction commit flags - rename them for clarity. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:37 -05:00
Kent Overstreet	5927310dcf	bcachefs: bch_str_hash_flags_t Create a separate enum for str_hash flags - instead of abusing the btree_insert_flags enum - and create a __bitwise typedef for sparse typechecking. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:37 -05:00
Kent Overstreet	aa62aabbc7	bcachefs: Kill dead BTREE_INSERT flags BTREE_INSERT_NOWAIT and BTREE_INSERT_GC_LOCK_HELD are no longer used, and can be deleted. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:37 -05:00
Kent Overstreet	cd5bd16282	bcachefs: Fix redundant variable initialization path->level was being read, but never used. Reported-by: Colin Ian King <colin.i.king@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:37 -05:00
Kent Overstreet	e17b93eb36	bcachefs: Avoiding dropping/retaking write locks in bch2_btree_write_buffer_flush_one() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:37 -05:00
Kent Overstreet	573224301c	bcachefs: Make journal replay more efficient Journal replay now first attempts to replay keys in sorted order, similar to how the btree write buffer flush path works. Any keys that can not be replayed due to journal deadlock are then left for later and replayed in journal order, unpinning journal entries as we go. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:37 -05:00
Kent Overstreet	bdde9829de	bcachefs: Go rw before journal replay This gets us slightly nicer log messages. Also, this slightly clarifies synchronization of c->journal_keys; after we go RW it's in use by multiple threads (so that the btree iterator code can overlay keys from the journal); so it has to be prepped before that point. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:37 -05:00
Kent Overstreet	43c7ede009	bcachefs: Kill BTREE_UPDATE_PREJOURNAL With the previous patch that reworks BTREE_INSERT_JOURNAL_REPLAY, we can now switch the btree write buffer to use it for flushing. This has the advantage that transaction commits don't need to take a journal reservation at all. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:37 -05:00
Kent Overstreet	9a71de675f	bcachefs: BTREE_INSERT_JOURNAL_REPLAY now "don't init trans->journal_res" This slightly changes how trans->journal_res works, in preparation for changing the btree write buffer flush path to use it. Now, BTREE_INSERT_JOURNAL_REPLAY means "don't take a journal reservation; trans->journal_res.seq already refers to the journal sequence number to pin". Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:37 -05:00
Kent Overstreet	389c92b36e	bcachefs: Clear k->needs_whitout earlier in commit path The upcoming btree write buffer rework is going to use the journal itself as the first stage of the write buffer; this is a cleanup to make sure k->needs_whiteout is initialized before keys hit the journal. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:37 -05:00
Kent Overstreet	066a26460b	bcachefs: track_event_change() This introduces a new helper for connecting time_stats to state changes, i.e. when taking journal reservations is blocked for some reason. We use this to track separately the different reasons the journal might be blocked - i.e. space in the journal full, or the journal pin fifo full. Also do some cleanup and improvements on the time stats code. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:37 -05:00
Kent Overstreet	3eedfe1af9	bcachefs: Journal pins must always have a flush_fn flush_fn is how we identify journal pins in debugfs - this is a debugging aid. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:37 -05:00
Kent Overstreet	df8e13ccf3	bcachefs: Add an assertion in bch2_journal_pin_set() Previously, bch2_journal_pin_set() would silently ignore a request to pin a journal sequence number that was no longer dirty, because it was used internally by bch2_journal_pin_copy() which could race with the src pin being flushed. Split these apart so that we can properly assert that @seq is a currently dirty journal sequence number - this is almost always a bug. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:37 -05:00
Kent Overstreet	fa5df9e7d5	bcachefs: Include average write size in sysfs journal_debug Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:36 -05:00
Kent Overstreet	09e0153b72	bcachefs: Fix warning when building in userspace bch_err() doesn't reference the fs arg in userspace Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:36 -05:00
Kent Overstreet	fbf9270817	bcachefs: Print old version when scanning for old metadata Also: we should be using bch2_fs_read_write_early() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:36 -05:00
Kent Overstreet	7d9ae04e39	bcachefs: Fix locking when checking freespace btree On transaction restart, we weren't re-validating the hole we saw. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:36 -05:00
Kent Overstreet	359d1bad1b	bcachefs: Check for unlinked inodes not on deleted list Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:36 -05:00
Kent Overstreet	ecf8a74dab	bcachefs: kill INODE_LOCK, use lock_two_nondirectories() In an ideal world, we'd have a common helper that could be used for sorting a list of inodes into the correct lock order, and then the same lock ordering could be used for any type of inode lock, not just i_rwsem. But the lock ordering rules for i_rwsem are a bit complicated, so - abandon that dream for now and do it the more standard way. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:36 -05:00
Kent Overstreet	8b58623f5b	bcachefs: Improved backpointer messages in fsck When we have a key to print, we should print it. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:36 -05:00
Kent Overstreet	e7f7ddedd6	bcachefs: Add extra verbose logging for ro path Also log time waiting for c->writes references to be dropped; this will help in debugging why unmounts are taking longer than they should. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:36 -05:00
Kent Overstreet	30418de09e	bcachefs: Flush fsck errors before running twice It's confusing if we run fsck a second time (in debug mode, to verify the second run is clean), but errors are still ratelimited from the first run. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:36 -05:00
Kent Overstreet	0d72ab35a9	bcachefs: make RO snapshots actually RO Add checks to all the VFS paths for "are we in a RO snapshot?". Note - we don't check this when setting inode options via our xattr interface, since those generally only affect data placement, not contents of data. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> Reported-by: "Carl E. Thompson" <list-bcachefs@carlthompson.net>	2024-01-01 11:47:07 -05:00
Kent Overstreet	84f1638795	bcachefs: bch_sb_field_downgrade Add a new superblock section that contains a list of { minor version, recovery passes, errors_to_fix } that is - a list of recovery passes that must be run when downgrading past a given version, and a list of errors to silently fix. The upcoming disk accounting rewrite is not going to be fully compatible: we're going to have to regenerate accounting both when upgrading to the new version, and also from downgrading from the new version, since the new method of doing disk space accounting is a completely different architecture based on deltas, and synchronizing them for every jounal entry write to maintain compatibility is going to be too expensive and impractical. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:07 -05:00
Kent Overstreet	8b16413cda	bcachefs: bch_sb.recovery_passes_required Add two new superblock fields. Since the main section of the superblock is now fully, we have to add a new variable length section for them - bch_sb_field_ext. - recovery_passes_requried: recovery passes that must be run on the next mount - errors_silent: errors that will be silently fixed These are to improve upgrading and dwongrading: these fields won't be cleared until after recovery successfully completes, so there won't be any issues with crashing partway through an upgrade or a downgrade. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:07 -05:00
Kent Overstreet	808c680f2a	bcachefs: Add persistent identifiers for recovery passes The next patch will start to refer to recovery passes from the superblock; naturally, we now need identifiers that don't change, since the existing enum is in the order in which they are run and is not fixed. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:07 -05:00
Kent Overstreet	560661d4ae	bcachefs: prt_bitflags_vector() similar to prt_bitflags(), but for ulong arrays Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:07 -05:00
Kent Overstreet	6b49b0f7e7	bcachefs: move BCH_SB_ERRS() to sb-errors_types.h we need BCH_SB_ERR_MAX in bcachefs.h Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:07 -05:00
Kent Overstreet	d9534cc9fc	bcachefs: fix buffer overflow in nocow write path BCH_REPLICAS_MAX isn't the actual maximum number of pointers in an extent, it's the maximum number of dirty pointers. We don't have a real restriction on the number of cached pointers, and we don't want a fixed size array here anyways - so switch to DARRAY_PREALLOCATED(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> Reported-and-tested-by: Daniel J Blueman <daniel@quora.org>	2024-01-01 11:46:52 -05:00
Kent Overstreet	099dc5c29d	bcachefs: DARRAY_PREALLOCATED() Add support to darray for preallocating some number of elements. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:46:52 -05:00
Kent Overstreet	a58a6a58f5	bcachefs: Switch darray to kvmalloc() We sometimes use darrays for quite large buffers - the btree write buffer in particular needs large buffers, since it must be sized to hold all the write buffer keys outstanding in the journal. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:46:52 -05:00
Kent Overstreet	73ab9e0386	bcachefs: Factor out darray resize slowpath Move the slowpath (actually growing the darray) to an out-of-line function; also, add some helpers for the upcoming btree write buffer rewrite. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:46:52 -05:00

... 3 4 5 6 7 ...

3361 Commits