linux

iv/linux

Author	SHA1	Message	Date
Kent Overstreet	a2519a9688	bcachefs: Tiny bch2_trans_update_by_path_trace() optimization This just removes a redundant comparison - there's more work we could do here to remove some redundant copying. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:46 -04:00
Kent Overstreet	c9ee99ad8c	bcachefs: Move some asserts behind CONFIG_BCACHEFS_DEBUG Convert some non-critical asserts in long-stable code to debug asserts. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:46 -04:00
Kent Overstreet	001783e261	bcachefs: Split out __bch2_btree_node_get() Standard splitting out of the slow path from the fast path of a function. We may follow this up in another patch with inlining the fast path into btree_iter.c. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:46 -04:00
Kent Overstreet	dab1e24867	bcachefs: Handle last journal write being torn If the last journal write didn't complete sucessfully due to a torn write, we'll detect it as a checksum error. In that case, we should just pretend that journal entry was never written. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:46 -04:00
Kent Overstreet	ff56d68cf9	bcachefs: Improve journal_read() logging Print out the journal entries we read and will replay as soon as possible - if we get an error walidating keys it's helpful to know where it was in the journal. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:46 -04:00
Kent Overstreet	96c2e01083	bcachefs: Fix a transaction path overflow It turns out we need bch2_extent_trim_atomi() even when we're deleting extents one at a time because it's possible for one reflink_p to reference arbitrarily many reflink_v extents. This doesn't normally happen, but the data move path can fragment existing extents in the background. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:46 -04:00
Kent Overstreet	42af0ad569	bcachefs: Fix a race with b->write_type b->write_type needs to be set atomically with setting the btree_node_need_write flag, so move it into b->flags. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:46 -04:00
Kent Overstreet	7fec8266af	bcachefs: Error message improvement - Centralize format strings in bcachefs.h - Add bch2_fmt_inum_offset() and related helpers - Switch error messages for inodes to also print out the offset, in bytes Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:46 -04:00
Kent Overstreet	8eb71e9e1a	bcachefs: Improve a few warnings Warnings ought to always have a format string/log message - makes them considerably more useful. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:46 -04:00
Kent Overstreet	3bce138373	bcachefs: Fix for_each_btree_key2() Previously, when we exited from the loop body with a break statement _ret wouldn't have been assigned to yet, and we could spuriously return a transaction restart error. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:46 -04:00
Kent Overstreet	4fcdd6ec34	bcachefs: Btree split improvement This improves the bkey_format calculation when splitting btree nodes. Previously, we'd use a format calculated for the original node for the lower of the two new nodes. This was particularly bad on sequential insertions, where we iteratively split the last btree node, whos format has to include KEY_MAX. Now, we calculate formats precisely for the keys the two new nodes will contain. This also should make splitting a bit more efficient, since we're only copying keys once (from the original node to the new node, instead of new node, replacement node, then upper split). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:46 -04:00
Kent Overstreet	0f35e0860a	bcachefs: Fix return code from btree_path_traverse_one() trans->restarted is a positive error code, not the usual negative Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:46 -04:00
Kent Overstreet	6b1b186a5a	bcachefs: Minor dio write path improvements This switches where we take quota reservations to be per bch_wirte_op instead of per dio_write, so we can drop the quota reservation in the same place as we call i_sectors_acct(), and only take/release ei_quota_lock once. In the future we'd like ei_quota_lock to not be a mutex, so that we can avoid punting to process context before deliving write completions in nocow mode. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:46 -04:00
Kent Overstreet	84fea8e5b3	bcachefs: Quota: Don't allocate memory under lock The genradix code can handle multiple threads trying to allocate at the same time - we don't need the genradix_ptr_alloc() call to happen under a lock. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:46 -04:00
Kent Overstreet	061f7999a6	bcachefs: Fix a use after free This fixes a regression from percpu freedlists in the btree key cache code: in a rare error path, we were immediately freeing a bkey_cached that had been used before and should've waited for an SRCU barrier. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:46 -04:00
Kent Overstreet	961cbdef3c	bcachefs: Delete atomic_inc_bug() These were wrappers around atomic operations that verified that the counter wasn't negative, but they're dead code - delete. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:46 -04:00
Kent Overstreet	b2d1d56b1d	bcachefs: Fixes for building in userspace - Marking a non-static function as inline doesn't actually work and is now causing problems - drop that - Introduce BCACHEFS_LOG_PREFIX for when we want to prefix log messages with bcachefs (filesystem name) - Userspace doesn't have real percpu variables (maybe we can get this fixed someday), put an #ifdef around bch2_disk_reservation_add() fastpath Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:46 -04:00
Kent Overstreet	a7ecd30c83	bcachefs: Factor out two_state_shared_lock We have a unique lock used for controlling adding to the pagecache: the lock has two states, where both states are shared - the lock may be held multiple times for either state - but not both states at the same time. This is exactly what we need for nocow mode locking, so this patch pulls it out of fs.c into its own file. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:45 -04:00
Kent Overstreet	a1ee777bfc	bcachefs: Kill BCH_WRITE_FLUSH BCH_WRITE_FLUSH is a write flag that causes a journal flush. It's only used in the direct IO path, and this will allow for some consolidation with the regular fsync path, which will help with the upcoming nocow mode. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:45 -04:00
Kent Overstreet	03e83f6302	bcachefs: bch2_trans_commit_bkey_invalid() - factor out more slowpath code into non-inline function - use bch2_print_string_as_lines(), so our error message doesn't get truncated Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:45 -04:00
Kent Overstreet	07de1803b8	bcachefs: Kill bch2_alloc_sectors_start() Only used in one place, just inline it there. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:45 -04:00
Kent Overstreet	984dc67e3b	bcachefs: Improve __bch2_btree_path_make_mut() btree_path_copy() doesn't need to call bch2_btree_path_check_sort_fast() - the newly allocated path will always be in the correct position, post copy; also delete some redundant branches from __bch2_btree_path_make_mut(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:45 -04:00
Kent Overstreet	0cc455b3ca	bcachefs: Inlining improvements - Don't call into bch2_encrypt_bio() when we're not encrypting - Pull slowpath out of trans_lock_write() - Make sure bc2h_trans_journal_res_get() gets inlined. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:45 -04:00
Kent Overstreet	182c7bbfbf	bcachefs: DIO write path optimization - With BCH_WRITE_SYNC, we no longer need the completion in struct dio_write - Pull out bch2_dio_write_copy_iov() into a separate non-inline function, it's code that doesn't run in the common case - Copy mapping and inode pointers into dio_write, avoiding pointer chasing at the start of bch2_dio_write_loop() - kthread_use_mm() is not needed in the common case; move it into bch2_dio_write_loop_async() - factor out various helpers from bch2_dio_write_loop() and rework control flow for better icache utilization Other small optimizations: - bch2_keylist_free() is only used in one place, at the end of the bch2_write() path - drop the reinit - in bch2_disk_reservation_put(), check if res->sectors is nonzero before touching c->online_reserved, since that will likely be a cache miss Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> bcachefs: More DIO write path optimization Better code prefetching (?) Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:45 -04:00
Kent Overstreet	1df3e19996	bcachefs: BCH_WRITE_SYNC This adds a new flag for the write path, BCH_WRITE_SYNC, and switches the O_DIRECT write path to use it when we're not running asynchronously. It runs the btree update after the write in the original thread's context instead of a kworker, cutting context switches in half. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:45 -04:00
Kent Overstreet	a101957649	bcachefs: More style fixes Fixes for various checkpatch errors. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:45 -04:00
Kent Overstreet	d4bce63636	bcachefs: Kill BCH_WRITE_JOURNAL_SEQ_PTR Dead code, delete. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:45 -04:00
Kent Overstreet	2cb7517969	bcachefs: should_compact_all() This factors out a properly-documented helper for deciding when we want to sort a btree node with MAX_BSETS bsets down to a single bset. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:45 -04:00
Kent Overstreet	46fee692ee	bcachefs: Improved btree write statistics This replaces sysfs btree_avg_write_size with btree_write_stats, which now breaks out statistics by the source of the btree write. Btree writes that are too small are a source of inefficiency, and excessive btree resort overhead - this will let us see what's causing them. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:45 -04:00
Kent Overstreet	8852501fe5	bcachefs: Improve fs_usage_apply_warn() message Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:45 -04:00
Kent Overstreet	80fe580c8d	bcachefs: Fix a spurious warning Fixes fstests generic/648 Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:45 -04:00
Kent Overstreet	353448f3ea	bcachefs: Fix buffered write path for generic/275 Per fstests generic/275, on -ENOSPC we're supposed write until the filesystem is full - i.e. do a partial write instead of failing the full write. This is a partial fix for the buffered write path: we'll still fail on a page boundary. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:45 -04:00
Kent Overstreet	c167f9e541	bcachefs: Journal keys overlay fixes - In the btree iterator code that overlays keys from the journal, we were incorrectly specifying level=0 instead of the btree_path's current level in a few places - When we didn't do journal replay, we shouldn't free the journal keys: this fixes cmd_list and cmd_dump, which run in norecovery mode Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:45 -04:00
Kent Overstreet	1f69368c5c	bcachefs: Fix an out-of-bounds shift roundup_pow_of_two() is undefined for 0 - oops. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:45 -04:00
Kent Overstreet	df6a24f81a	bcachefs: Make error messages more uniform Use __func__ in error messages that refer to function name, and do so more uniformly - suggested by checkpatch.pl Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:45 -04:00
Kent Overstreet	fd0c767966	bcachefs: Convert to __packed and __aligned Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:45 -04:00
Kent Overstreet	e5baf3dad9	bcachefs: bch2_bkey_cmp_packed_inlined() This adds an inlined version of bch2_bkey_cmp_packed(), and uses it in bch2_sort_keys(), where it's part of the inner loop. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:45 -04:00
Kent Overstreet	77671e8fff	bcachefs: Move bkey bkey_unpack_key() to bkey.h Long ago, bkey_unpack_key() was added to bset.h instead of bkey.h because bkey.h didn't include btree_types.h, which it needs for the compiled unpack function. This patch finally moves it to the proper location. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:45 -04:00
Kent Overstreet	005def8ff1	bcachefs: Optimize __bch2_btree_node_iter_advance() This replaces an expensive memmove() call with an open-coded version. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:45 -04:00
Kent Overstreet	c81f5836a4	bcachefs: Don't touch c->flags in bch2_trans_iter_init() This moves the JOURNAL_REPLAY_DONE flag check from bch2_trans_iter_init() to bch2_trans_init(), where we stash a copy in btree_trans - gaining us a small performance improvement. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:44 -04:00
Kent Overstreet	3e3e02e6bc	bcachefs: Assorted checkpatch fixes checkpatch.pl gives lots of warnings that we don't want - suggested ignore list: ASSIGN_IN_IF UNSPECIFIED_INT - bcachefs coding style prefers single token type names NEW_TYPEDEFS - typedefs are occasionally good FUNCTION_ARGUMENTS - we prefer to look at functions in .c files (hopefully with docbook documentation), not .h file prototypes MULTISTATEMENT_MACRO_USE_DO_WHILE - we have _many_ x-macros and other macros where we can't do this Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:44 -04:00
Kent Overstreet	ed80c5699a	bcachefs: Optimize bch2_dev_usage_read() - add bch2_dev_usage_read_fast(), which doesn't return by value - bch_dev_usage is big enough that we don't want the silent memcpy - tweak the allocation path to only call bch2_dev_usage_read() once per bucket allocated Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:44 -04:00
Daniel Hill	2d485df3da	bcachefs: fix bch2_write_extent() crc corruption. crc.compression_type & nouce gets reset to inside bch2_rechecksum_bio(), we set it back to the previous values calculated. This fixes incompressible extents being marked as uncompressed. Signed-off-by: Daniel Hill <daniel@gluo.nz> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:44 -04:00
Daniel B. Hill	55b8550d30	bcachefs: fix security warning in pr_name_and_units Signed-off-by: Daniel Hill <daniel@gluo.nz> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:44 -04:00
Kent Overstreet	f83009cda3	bcachefs: Don't issue transaction restart on key cache realloc This shouldn't be needed anymore, since we don't rely on the pointer validity that this was guarding against anymore - we get a new good reference and save it right after this function. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:44 -04:00
Kent Overstreet	ef035f42a0	bcachefs: Separate out flush_new_cached_update() This separates out the slowpath of bch2_trans_update_by_path_trace() into a new non-inlined helper. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:44 -04:00
Kent Overstreet	b0c5b15cc8	bcachefs: Optimize __bkey_unpack_key_format_checked() Delete some code when CONFIG_BCACHEFS_DEBUG=n Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:44 -04:00
Kent Overstreet	3e8b4b3afe	bcachefs: Inline bch2_inode_pack() It's mainly used from bch2_inode_write(), so inline it there. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:44 -04:00
Kent Overstreet	adf16c6dfa	bcachefs: bucket_alloc_fail tracepoint should only fire when we have to block We don't want to fire the bucket_alloc_fail tracepoint on transaction restart, when we can retry immediately - only when we the allocation actually has to block. Also, switch from sched_clock() to local_clock(), as we've been doing elsewhere. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:44 -04:00
Kent Overstreet	307e3c1319	bcachefs: Optimize bch2_trans_init() Now we store the transaction's fn idx in a local variable, instead of redoing the lookup every time we call bch2_trans_init(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:44 -04:00
Kent Overstreet	29aa78f15e	bcachefs: Split out __btree_path_up_until_good_node() This breaks up btree_path_up_until_good_node() so that only the fastpath gets inlined. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:44 -04:00
Kent Overstreet	b2f83e769f	bcachefs: Btree key cache shrinker fix The shrinker assumes freed key cache items are ordered by age, so that it doesn't have to scan the full list to find items that are old enough (according to the srcu code) to be freed. But percpu freelists broke this ordering; this patch fixes this by ensuring we insert items into the proper position. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:44 -04:00
Daniel Hill	be75bb7a0e	bcachefs: __bio_compress() fix up. A single block can't be compressed, so it's incompressible. This stops rebalance repeatably marking extents as uncompressed. Signed-off-by: Daniel Hill <daniel@gluo.nz> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:44 -04:00
Daniel Hill	597c6d17b1	bcachefs: make durability a read-write sysfs option Sometimes the user may need to change durability after formatting to match current hardware setup, this option provides a quick and flexible alternative to removing then adding the device. It is HIGHLY ADVISED TO RUN REREPLICATE after changing this value so the system doesn't remain degraded. Signed-off-by: Daniel Hill <daniel@gluo.nz> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:44 -04:00
Daniel Hill	b5ac23c465	bcachefs: improve behaviour of btree_cache_scan() Appending new nodes to the end of the list means we're more likely to evict old entries when btree_cache_scan() is started. Signed-off-by: Daniel Hill <daniel@gluo.nz> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:44 -04:00
Kent Overstreet	bd954215ca	bcachefs: Quota fixes - We now correctly allow soft limits to be exceeded, instead of always returning -EDQUOT - Disk quota grate times/warnings can now be set, not just the systemwide defaults Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:44 -04:00
Kent Overstreet	d7e4e51370	bcachefs: Switch to local_clock() for fastpath time source local_clock() isn't always completely accurate - e.g. on machines with TSC drift - but ktime_get_ns() overhead is too high, unfortunately. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:44 -04:00
Kent Overstreet	fe5b37f699	bcachefs: Btree key cache improvements - In userspace, we don't have real percpu variables; this patch disables the percpu freelists in userspace - add some error messages for the asserts in bch2_fs_btree_key_cache_exit(); we've been hitting this (only in userspace, oddly), perhaps this will help us track down the error. - bkey_cached_reuse() should likely be taking the key cache lock, and it's a slowpath so it doesn't hurt to Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:44 -04:00
Kent Overstreet	dccedaaa52	bcachefs: Fix btree node prefetchig We were forgetting to count down the number of nodes to prefetch, firing off _way_ more than intended - whoops. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:44 -04:00
Kent Overstreet	0196eb89ab	bcachefs: bch2_btree_key_cache_scan() doesn't need trylock We don't actually allocate memory under the btree key cache lock - so there's no recursion concerns, and the shrinker can just use mutex_lock(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:44 -04:00
Kent Overstreet	d1b2c864e0	bcachefs: Defer full journal entry validation On journal read, previously we would do full journal entry validation immediately after reading a journal entry. However, this would lead to errors for journal entries we weren't actually going to use, either because they were too old or too new (newer than the most recent flush). We've observed write tearing on journal entries newer than the newest flush - which makes sense, prior to a flush there's no guarantees about write persistence. This patch defers full journal entry validation until the end of the journal read path, when we know which journal entries we'll want to use. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:44 -04:00
Kent Overstreet	17fe3b6452	bcachefs: Improve journal_entry_add() Prep work for the next patch, to defer journal entry validation: we now track for each replica whether we had a good checksum. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:43 -04:00
Daniel Hill	bf8f8b20a1	bcachefs: time stats now uses the mean_and_variance module. Signed-off-by: Daniel Hill <daniel@gluo.nz> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:43 -04:00
Daniel Hill	92095781e0	bcachefs: Mean and variance This module provides a fast 64bit implementation of basic statistics functions, including mean, variance and standard deviation in both weighted and unweighted variants, the unweighted variant has a 32bit limitation per sample to prevent overflow when squaring. Signed-off-by: Daniel Hill <daniel@gluo.nz> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:43 -04:00
Kent Overstreet	07bfcc0b4c	bcachefs: Fix for not dropping privs in fallocate When modifying a file, we may be required to drop the suid/sgid bits - we were missing a file_modified() call to do this. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:43 -04:00
Kent Overstreet	3a4d3656e5	bcachefs: Fix bch2_write_begin() An error case was jumping to the wrong label, creating an infinite loop - oops. This fixes fstests generic/648. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:43 -04:00
Kent Overstreet	40405557b9	fixup bcachefs: Deadlock cycle detector Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:43 -04:00
Kent Overstreet	80df5b8cac	fixup bcachefs: Deadlock cycle detector Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:43 -04:00
Kent Overstreet	896f1b316f	bcachefs: Fix lock_graph_remove_non_waiters() We were removing 1 more entry than we were supposed to - oops. Also some other simplifications and cleanups, and bring back the abort preference code in a better fashion. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:43 -04:00
Kent Overstreet	65ff2d3a7a	bcachefs: Support FS_XFLAG_PROJINHERIT We already have support for the flag's semantics: inode options are inherited by children if they were explicitly set on the parent. This patch just maps the FS_XFLAG_PROJINHERIT flag to the "this option was epxlicitly set" bit. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:43 -04:00
Kent Overstreet	bf9cb250ed	bcachefs: Don't allow hardlinks when inherited attrs would change This is the right thing to do, and conforms with our own behaviour on rename and xfs's behaviour on hardlink. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:43 -04:00
Kent Overstreet	f866870f5d	bcachefs: Initialize sb_quota with default 1 week timer For compliance with other quota implementations, we should be initializing quota information with a default 1 week timelimit: this fixes fstests generic/235. Also, this adds to_text() functions for some quota structs - useful debugging aids. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:43 -04:00
Kent Overstreet	de107dc800	bcachefs: Call bch2_btree_update_add_new_node() before dropping write lock btree nodes can be written by other threads (shrinker, journal reclaim) with only a read lock, but brand new nodes should only be written by the thread doing the split/interior update. bch2_btree_update_add_new_node() sets btree node flags to indicate that this is a new node and should not be written out by other threads, thus we need to call it before dropping our write lock. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:43 -04:00
Kent Overstreet	e8540e5681	bcachefs: Reflink now respects quotas This adds a new helper, quota_reserve_range(), which takes a quota reservation for unallocated blocks in a given file range, and uses it in bch2_remap_file_range(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:43 -04:00
Kent Overstreet	f42238b5cd	bcachefs: Fix a rare path in bch2_btree_path_peek_slot() In the drop_alloc tests, we may end up calling bch2_btree_iter_peek_slot() on an interior level that doesn't exist. Previously, this would hit the path->uptodate assertion in bch2_btree_path_peek_slot(); this path first checks a NULL btree node, which is how we know we're at the end of the btree. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:43 -04:00
Kent Overstreet	7dcbdbd85c	bcachefs: bch2_path_put_nokeep() The btree iterator code may allocate extra btree paths, temporarily, that do not refer to keys being returned: we don't need to wait until transaction restart to drop these, when they're not referenced they should be deleted right away. This fixes a transaction path overflow bug. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:43 -04:00
Kent Overstreet	5b3243cb52	bcachefs: Fix cached data accounting Negating without casting to a signed integer means the value wasn't getting sign extended properly - oops. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:43 -04:00
Kent Overstreet	1f0f731ffe	bcachefs: Btree splits now only take the locks they need Previously, bch2_btree_update_start() would always take all intent locks, all the way up to the root. We've finally got data from users where this became a scalability issue - so, this patch fixes bch2_btree_update_start() to only take the locks we need. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:43 -04:00
Kent Overstreet	969576ecae	bcachefs: bch2_btree_iter_peek() now works with interior nodes Needed by the next patch, which will be iterating over keys in nodes at level 1. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:43 -04:00
Kent Overstreet	1ff7849f3b	bcachefs: bch2_btree_insert_node() no longer uses lock_write_nofail Now that we have an error path plumbed through, there's no need to be using bch2_btree_node_lock_write_nofail(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:43 -04:00
Kent Overstreet	a8eefbd324	bcachefs: Add error path to btree_split() The next patch in the series is (finally!) going to change btree splits (and interior updates in general) to not take intent locks all the way up to the root - instead only locking the nodes they'll need to modify. However, this will be introducing a race since if we're not holding a write lock on a btree node it can be written out by another thread, and then we might not have enough space for a new bset entry. We can handle this by retrying - we just need to introduce a new error path. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:43 -04:00
Kent Overstreet	8cbb000250	bcachefs: Write new btree nodes after parent update In order to avoid locking all btree nodes up to the root for btree node splits, we're going to have to introduce a new error path into bch2_btree_insert_node(); this mean we can't have done any writes or modified global state before that point. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:43 -04:00
Kent Overstreet	fe2de9a8dc	bcachefs: Simplify break_cycle() We'd like to prioritize aborting transactions that have done less work - however, it appears breaking cycles by telling other threads to abort may still be buggy, so disable that for now. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:43 -04:00
Kent Overstreet	1148a97f1f	bcachefs: Print cycle on unrecoverable deadlock Some lock operations can't fail; a cycle of nofail locks is impossible to recover from. So we want to get rid of these nofail locking operations, but as this is tricky it'll be done incrementally. If such a cycle happens, this patch prints out which codepaths are involved so we know what to work on next. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:42 -04:00
Kent Overstreet	1be887979b	bcachefs: Handle dropping pointers in data_update path Cached pointers are generally dropped, not moved: this led to an assertion firing in the data update path when there were no new replicas being written. This path adds a data_options field for pointers to be dropped, and tweaks move_extent() to check if we're only dropping pointers, not writing new ones, before kicking off a data update operation. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:42 -04:00
Kent Overstreet	160dff6dad	bcachefs: Ratelimit ec error message We should fix this, but for now this makes this more usable. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:42 -04:00
Kent Overstreet	2da671dc4a	bcachefs: Use btree_type_has_ptrs() more consistently Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:42 -04:00
Kent Overstreet	6c22eb7085	bcachefs: Fix "multiple types of data in same bucket" with ec Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:42 -04:00
Kent Overstreet	22f5162133	bcachefs: Ensure fsck error is printed before panic When errors=panic, we want to make sure we print the error before calling bch2_inconsistent_error(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:42 -04:00
Kent Overstreet	8aaee94d46	bcachefs: Fix a deadlock in btree_update_nodes_written() btree_node_lock_nopath() is something we'd like to get rid of, it's always prone to deadlocks if we accidentally are holding other locks, because it doesn't mark the lock it's taking in a path: we'll want to get rid of it in the future, but for now this patch works it by calling bch2_trans_unlock(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:42 -04:00
Kent Overstreet	13bc41a715	bcachefs: bch2_trans_locked() Useful debugging function. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:42 -04:00
Kent Overstreet	40a44873a5	bcachefs: Improve btree_deadlock debugfs output This changes bch2_check_for_deadlock() to print the longest chains it finds - when we have a deadlock because the cycle detector isn't finding something, this will let us see what it's missing. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:42 -04:00
Kent Overstreet	943f9946a6	bcachefs: Don't quash error in bch2_bucket_alloc_set_trans() We were incorrectly returning -BCH_ERR_insufficient_devices when we'd received a different error from bch2_bucket_alloc_trans(), which (erronously) turns into -EROFS further up the call chain. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:42 -04:00
Kent Overstreet	685e0f0c47	bcachefs: Fix a trans path overflow in bch2_btree_delete_range_trans() bch2_btree_delete_range_trans() was using btree_trans_too_many_iters() to avoid path overflow, but this was buggy here (and also btree_trans_too_many_iters() is suspect in general). btree_trans_too_many_iters() only returns true when we're close to the maximum number of paths - within 8 - but extent insert/delete assumes that it can use more paths than that. Instead, we need to call bch2_trans_begin() on every loop iteration. Since we don't want to call bch2_trans_begin() (restarting the outer transaction) if the call was a no-op - if we had no work to do - we have to structure things a bit oddly. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:42 -04:00
Kent Overstreet	ae10fe017b	bcachefs: bucket_alloc_state This refactoring puts our various allocation path counters into a dedicated struct - the upcoming nocow patch is going to add another counter. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:42 -04:00
Kent Overstreet	29cea6f483	bcachefs: Fix bch2_btree_path_up_until_good_node() There was a rare bug when path->locks_want was nonzero, but not BTREE_MAX_DEPTH, where we'd return on a valid node that wasn't locked - oops. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:42 -04:00
Kent Overstreet	e0eaf86259	bcachefs: Factor out bch2_write_drop_io_error_ptrs() Move slowpath code to a separate, non-inline function. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:42 -04:00
Kent Overstreet	99e2146bea	bcachefs: Break out bch2_btree_path_traverse_cached_slowpath() Prep work for further refactoring. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:42 -04:00
Kent Overstreet	2d848dacb2	bcachefs: Kill io_in_flight semaphore This used to be needed more for buffered IO, but now the block layer has writeback throttling - we can delete this now. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:42 -04:00
Kent Overstreet	68b6cd194a	bcachefs: Improve bucket_alloc tracepoint It now includes more info - whether the bucket was for metadata or data - and also call it in the same place as the bucket_alloc_fail tracepoint. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:42 -04:00
Kent Overstreet	c298fd7d34	bcachefs; Mark __bch2_trans_iter_init as inline This function is fairly small and only used in two places: one very hot, the other cold, so it should definitely be inlined. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:42 -04:00
Kent Overstreet	25b4b3308e	bcachefs: Inline fast path of check_pos_snapshot_overwritten() This moves the slowpath of check_pos_snapshot_overwritten() to a separate function, and inlines the fast path - helping performance on btrees that don't use snapshot and for users that aren't using snapshots. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:42 -04:00
Kent Overstreet	c23a9e0882	bcachefs: Improve jset_validate() Previously, jset_validate() was formatting the initial part of an error string for every entry it validating - expensive. This moves that code to journal_entry_err_msg(), which is now only called if there's an actual error. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:42 -04:00
Kent Overstreet	3f3bc66ef0	bcachefs: Optimize btree_path_alloc() - move slowpath code to a separate function, btree_path_overflow() - no need to use hweight64 - copy nr_max_paths from btree_transaction_stats to btree_trans, avoiding a data dependency in the fast path Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:42 -04:00
Kent Overstreet	14d8f26ad0	bcachefs: Inline bch2_trans_kmalloc() fast path Small performance optimization. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:42 -04:00
Kent Overstreet	f3b8403ee7	bcachefs: Run bch2_fs_counters_init() earlier We need counters to be initialized before initializing shrinkers - the shrinker callbacks will update those counters. This fixes a segfault in userspace. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:41 -04:00
Kent Overstreet	d704d62355	bcachefs: btree_err() now uses bch2_print_string_as_lines() We've seen long error messages get truncated here, so convert to the new bch2_print_string_as_lines(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:41 -04:00
Kent Overstreet	dbb9936b0d	bcachefs: Improve bch2_fsck_err() - factor out fsck_err_get() - if the "bcachefs (%s):" prefix has already been applied, don't duplicate it - convert to printbufs instead of static char arrays - tidy up control flow a bit - use bch2_print_string_as_lines(), to avoid messages getting truncated Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:41 -04:00
Kent Overstreet	a8f3542843	bcachefs: bch2_print_string_as_lines() This adds a helper for printing a large buffer one line at a time, to avoid the 1k printk limit. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:41 -04:00
Kent Overstreet	e9174370d0	bcachefs: bch2_btree_node_relock_notrace() Most of the node_relock_fail trace events are generated from bch2_btree_path_verify_level(), when debugcheck_iterators is enabled - but we're not interested in these trace events, they don't indicate that we're in a slowpath. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:41 -04:00
Kent Overstreet	c36ff038fd	bcachefs: bch2_btree_cache_scan() improvement We're still seeing OOM issues caused by the btree node cache shrinker not sufficiently freeing memory: thus, this patch changes the shrinker to not exit if __GFP_FS was not supplied. Instead, tweak btree node memory allocation so that we never invoke memory reclaim while holding the btree node cache lock. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:41 -04:00
Kent Overstreet	c6cf49a95a	bcachefs: Fix blocking with locks held This is a major oopsy - we should always be unlocking before calling closure_sync(), else we'll cause a deadlock. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:41 -04:00
Kent Overstreet	01ed3359b2	bcachefs: btree_update_nodes_written() needs BTREE_INSERT_USE_RESERVE This fixes an obvious deadlock - whoops. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:41 -04:00
Kent Overstreet	d602657cd1	bcachefs: Fix error handling in bch2_btree_update_start() We were checking for -EAGAIN, but we're not returned that when we didn't pass a closure to wait with - oops. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:41 -04:00
Kent Overstreet	afbc719468	bcachefs: Improve bch2_btree_trans_to_text() This is just a formatting/readability improvement. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:41 -04:00
Kent Overstreet	8b31e4fc7d	bcachefs: Kill normalize_read_intent_locks() Before we had the deadlock cycle detector, we didn't want to be holding read locks when taking intent locks, because blocking on an intent lock while holding a read lock was a lock ordering violation that could cause a deadlock. With the cycle detector this is no longer an issue, so this code can be deleted. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:41 -04:00
Kent Overstreet	2ec254c098	bcachefs: Ensure bch2_btree_node_lock_write_nofail() never fails In order for bch2_btree_node_lock_write_nofail() to never produce a deadlock, we must ensure we're never holding read locks when using it. Fortunately, it's only used from code paths where any read locks may be safely dropped. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:41 -04:00
Kent Overstreet	0d7009d7ca	bcachefs: Delete old deadlock avoidance code This deletes our old lock ordering based deadlock avoidance code. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:41 -04:00
Kent Overstreet	96d994b37c	bcachefs: Print deadlock cycle in debugfs In the event that we're not finished debugging the cycle detector, this adds a new file to debugfs that shows what the cycle detector finds, if anything. By comparing this with btree_transactions, which shows held locks for every btree_transaction, we'll be able to determine if it's the cycle detector that's buggy or something else. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:41 -04:00
Kent Overstreet	33bd5d0686	bcachefs: Deadlock cycle detector We've outgrown our own deadlock avoidance strategy. The btree iterator API provides an interface where the user doesn't need to concern themselves with lock ordering - different btree iterators can be traversed in any order. Without special care, this will lead to deadlocks. Our previous strategy was to define a lock ordering internally, and whenever we attempt to take a lock and trylock() fails, we'd check if the current btree transaction is holding any locks that cause a lock ordering violation. If so, we'd issue a transaction restart, and then bch2_trans_begin() would re-traverse all previously used iterators, but in the correct order. That approach had some issues, though. - Sometimes we'd issue transaction restarts unnecessarily, when no deadlock would have actually occured. Lock ordering restarts have become our primary cause of transaction restarts, on some workloads totally 20% of actual transaction commits. - To avoid deadlock or livelock, we'd often have to take intent locks when we only wanted a read lock: with the lock ordering approach, it is actually illegal to hold _any_ read lock while blocking on an intent lock, and this has been causing us unnecessary lock contention. - It was getting fragile - the various lock ordering rules are not trivial, and we'd been seeing occasional livelock issues related to this machinery. So, since bcachefs is already a relational database masquerading as a filesystem, we're stealing the next traditional database technique and switching to a cycle detector for avoiding deadlocks. When we block taking a btree lock, after adding ourself to the waitlist but before sleeping, we do a DFS of btree transactions waiting on other btree transactions, starting with the current transaction and walking our held locks, and transactions blocking on our held locks. If we find a cycle, we emit a transaction restart. Occasionally (e.g. the btree split path) we can not allow the lock() operation to fail, so if necessary we'll tell another transaction that it has to fail. Result: trans_restart_would_deadlock events are reduced by a factor of 10 to 100, and we'll be able to delete a whole bunch of grotty, fragile code. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:41 -04:00
Kent Overstreet	62448afee7	bcachefs: Fix bch2_btree_node_upgrade() Previously, if we were trying to upgrade from a read to an intent lock but we held an additional read lock via another btree_path, bch2_btree_node_upgrade() would always fail, in six_lock_tryupgrade(). This patch factors out the code that __bch2_btree_node_lock_write() uses to temporarily drop extra read locks, so that six_lock_tryupgrade() can succeed. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:41 -04:00
Kent Overstreet	845cffed0d	bcachefs: Add a debug assert Chasing down a strange locking bug. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:41 -04:00
Kent Overstreet	84a37cbf62	six locks: Wakeup now takes lock on behalf of waiter This brings back an important optimization, to avoid touching the wait lists an extra time, while preserving the property that a thread is on a lock waitlist iff it is waiting - it is never removed from the waitlist until it has the lock. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:41 -04:00
Kent Overstreet	e4b7254c75	six locks: Fix a lost wakeup There was a lost wakeup between a read unlock in percpu mode and a write lock. The unlock path unlocks, then executes a barrier, then checks for waiters; correspondingly, the lock side should set the wait bit and execute a barrier, then attempt to take the lock. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:41 -04:00
Kent Overstreet	5b254da573	six locks: Enable lockdep Now that we have lockdep_set_no_check_recursion(), we can enable lockdep checking. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:41 -04:00
Kent Overstreet	f6ea2d575d	six locks: Add start_time to six_lock_waiter This is needed by the cycle detector in bcachefs - we need a way to iterater over waitlist entries while dropping and retaking the waitlist lock. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:41 -04:00
Kent Overstreet	0bfb9f42b7	six locks: six_lock_waiter() This allows passing in the wait list entry - to be used for a deadlock cycle detector. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:41 -04:00
Kent Overstreet	ebc6f76a66	six locks: Simplify wait lists This switches to a single list of waiters, instead of separate lists for read and intent, and switches write locks to also use the wait lists instead of being handled differently. Also, removal from the wait list is now done by the process waiting on the lock, not the process doing the wakeup. This is needed for the new deadlock cycle detector - we need tasks to stay on the waitlist until they've successfully acquired the lock. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:40 -04:00
Kent Overstreet	098ef98d5b	bcachefs: Add private error codes for ENOSPC Continuing the saga of introducing private dedicated error codes for each error path, this patch converts ENOSPC to error codes that are subtypes of ENOSPC. We've recently had a test failure where we got -ENOSPC where we shouldn't have, and didn't have enough information to tell where it came from, so this patch will solve that problem. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:40 -04:00
Kent Overstreet	5c1ef830f6	bcachefs: Errcodes can now subtype standard error codes The next patch is going to be adding private error codes for all the places we return -ENOSPC. Additionally, this patch updates return paths at all module boundaries to call bch2_err_class(), to return the standard error code. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:40 -04:00
Kent Overstreet	57ce827442	bcachefs: Make an assertion more informative Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:40 -04:00
Kent Overstreet	e4215d0fec	bcachefs: All held locks must be in a btree path With the new deadlock cycle detector, it's critical that all held locks be marked in a btree_path, because that's what the cycle detector traverses - any locks that aren't correctly marked will cause deadlocks. This changes the btree_path to allocate some btree_paths for the new nodes, since until the final update is done we otherwise don't have a path referencing them. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:40 -04:00
Kent Overstreet	367d72dd5f	bcachefs: bch2_btree_path_upgrade() now emits transaction restart Centralizing the transaction restart/tracepoint in bch2_btree_path_upgrade() lets us improve the tracepoint - now it emits old and new locks_want. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:40 -04:00
Kent Overstreet	b8eec67591	bcachefs: Add a manual trigger for lock wakeups Spotted a lockup once that appeared to be a lost wakeup. Adding a manual trigger for lock wakeups will make it easy to tell if that's what it is next time it occurs. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:40 -04:00
Kent Overstreet	5a82c7c7d1	bcachefs: Fix sb_field_counters formatting We have counters with longer names now, so adjust the tabstop - also, make sure there's always a space printed between the name and the number. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:40 -04:00
Kent Overstreet	5877d8876a	bcachefs: Re-enable hash_redo_key() When subvolumes & snapshots were rolled out, hash_redo_key() was disabled due to some new complications - namely, bch2_hash_set() works at the subvolume level, and fsck does not run in a defined subvolume, instead working at the snapshot ID level. This patch splits out bch2_hash_set_snapshot() from bch2_hash_set(), and makes one small tweak for fsck: - Normally, bch2_hash_set() (and other dirent code) needs to know what subvolume we're in, because dirents that point to other subvolumes should only be visible in the subvolume they were created in, not other snapshots. We can't check that in fsck, so we just assume that all dirents are visible. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:40 -04:00
Kent Overstreet	1ffb876fb0	bcachefs: Kill journal_keys->journal_seq_base This removes an optimization that didn't actually save us any memory, due to alignment, but did make the code more complicated than it needed to be. We were also seeing a bug where journal_seq_base wasn't getting correctly initailized, so hopefully it'll fix that too. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:40 -04:00
Kent Overstreet	e87b0e4a71	bcachefs: Fix redundant transaction restart Little bit of tidying up, this makes the counters a little bit clearer as to what's happening. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:40 -04:00
Kent Overstreet	1bb9123301	bcachefs: Ensure intent locks are marked before taking write locks Locks must be correctly marked for the cycle detector to work. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:40 -04:00
Kent Overstreet	38474c2642	bcachefs: Avoid using btree_node_lock_nopath() With the upcoming cycle detector, we have to be careful about using btree_node_lock_nopath - in particular, using it to take write locks can cause deadlocks. All held locks need to be tracked in a btree_path, so that the cycle detector knows about them - unless we know that we cannot cause deadlocks for other reasons: e.g. we are only taking read locks, or we're in very early fsck (topology repair). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:40 -04:00
Kent Overstreet	3d21d48e89	bcachefs: Fix usage of six lock's percpu mode, key cache version Similar to "bcachefs: Fix usage of six lock's percpu mode", six locks have a percpu mode, but we can't switch between percpu and non percpu modes while a lock is in use: threads attempting to take a read lock may race, and we'll end up with the read count permanently off. Fixing this the "correct" way, in six_lock_pcpu_(alloc\|free) would require an RCU barrier, and we don't want to do that - instead, we have to permanently segragate percpu and non percpu objects, including when on freelists. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:40 -04:00
Kent Overstreet	0242130fb6	bcachefs: Refactor bkey_cached_alloc() path Clean up the arguments passed and make them more consistent. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:40 -04:00
Kent Overstreet	da4474f209	bcachefs: Convert more locking code to btree_bkey_cached_common Ideally, all the code in btree_locking.c should be converted, but then we'd want to convert btree_path to point to btree_key_cached_common too, and then we'd be in for a much bigger cleanup - but a bit of incremental cleanup will still be helpful for the next patches. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:40 -04:00
Kent Overstreet	4e6defd106	bcachefs: btree_bkey_cached_common->cached Add a type descriptor to btree_bkey_cached_common - there's no reason not to since we've got padding that was otherwise unused, and this is a nice cleanup (and helpful in later patches). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:40 -04:00
Kent Overstreet	6b81f194f3	bcachefs: Fix six_lock_readers_add() Have to be careful with bit fields - when subtracting, this was overflowing into the write_locking bit. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:40 -04:00
Kent Overstreet	d5024b011c	bcachefs: bch2_btree_node_lock_write_nofail() Taking a write lock will be able to fail, with the new cycle detector - unless we pass it nofail, which is possible but not preferred. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:40 -04:00
Kent Overstreet	ca7d8fcabf	bcachefs: New locking functions In the future, with the new deadlock cycle detector, we won't be using bare six_lock_* anymore: lock wait entries will all be embedded in btree_trans, and we will need a btree_trans context whenever locking a btree node. This patch plumbs a btree_trans to the few places that need it, and adds two new locking functions - btree_node_lock_nopath, which may fail returning a transaction restart, and - btree_node_lock_nopath_nofail, to be used in places where we know we cannot deadlock (i.e. because we're holding no other locks). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:40 -04:00
Kent Overstreet	546180874a	bcachefs: Mark write locks before taking lock six locks are unfair: while a thread is blocked trying to take a write lock, new read locks will fail. The new deadlock cycle detector makes use of our existing lock tracing, so we need to tell it we're holding a write lock before we take the lock for it to work correctly. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:40 -04:00
Kent Overstreet	534a591e4c	bcachefs: Delete time_stats for lock contended times Since we've now got time_stats for lock hold times (per btree transaction), we don't need this anymore. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:39 -04:00
Kent Overstreet	c919f53f3b	bcachefs: Don't leak lock pcpu counts memory This fixes a small memory leak. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:39 -04:00
Kent Overstreet	f5178b34b9	six locks: Delete six_lock_pcpu_free_rcu() Didn't have any users, and wasn't a good idea to begin with - delete it. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:39 -04:00
Kent Overstreet	674cfc2624	bcachefs: Add persistent counters for all tracepoints Also, do some reorganizing/renaming, convert atomic counters in bch_fs to persistent counters, and add a few missing counters. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:39 -04:00
Kent Overstreet	d97e6aaed6	bcachefs: Fix bch2_btree_update_start() to return -BCH_ERR_journal_reclaim_would_deadlock On failure to get a journal pre-reservation because we're called from journal reclaim we're not supposed to return a transaction restart error - this fixes a livelock. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:39 -04:00
Kent Overstreet	8a9c1b1cb0	bcachefs: Improve bch2_btree_node_relock() This moves the IS_ERR_OR_NULL() check to the inline part, since that's a fast path event. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:39 -04:00
Kent Overstreet	ce56bf7fc2	bcachefs: Improve trans_restart_journal_preres_get tracepoint It now includes journal_flags. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:39 -04:00
Kent Overstreet	5f1dd9a633	bcachefs: Improve btree_node_relock_fail tracepoint It now prints the error name when the btree node is an error pointer; also, don't trace failures when the the btree node is BCH_ERR_no_btree_node_up. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:39 -04:00
Kent Overstreet	b1cdc398ae	bcachefs: Make more btree_paths available - Don't decrease BTREE_ITER_MAX when building with CONFIG_LOCKDEP anymore. The lockdep table sizes are configurable now, we don't need this anymore. - btree_trans_too_many_iters() is less conservative now. Previously it was causing a transaction restart if we had used more than BTREE_ITER_MAX / 2 paths, change this to BTREE_ITER_MAX - 8. This helps with excessive transaction restarts/livelocks in the bucket allocator path. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:39 -04:00
Kent Overstreet	06a5394322	bcachefs: Correctly initialize bkey_cached->lock We need to use the right class for some assertions to work correctly. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:39 -04:00
Kent Overstreet	131dcd5af7	bcachefs: Track held write locks The upcoming lock cycle detection code will need to know precisely which locks every btree_trans is holding, including write locks - this patch updates btree_node_locked_type to include write locks. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:39 -04:00
Kent Overstreet	c240c3a944	bcachefs: Print lock counts in debugs btree_transactions Improve our debugfs output, to help in debugging deadlocks: this shows, for every btree node we print, the current number of readers/intent locks/write locks held. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:39 -04:00
Kent Overstreet	14599cce44	bcachefs: Switch btree locking code to struct btree_bkey_cached_common This is just some type safety cleanup. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:39 -04:00
Kent Overstreet	616928c30f	bcachefs: Track maximum transaction memory This patch - tracks maximum bch2_trans_kmalloc() memory used in btree_transaction_stats - makes it available in debugfs - switches bch2_trans_init() to using that for the amount of memory to preallocate, instead of the parameter passed in This drastically reduces transaction restarts, and means we no longer need to track this in the source code. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:39 -04:00
Kent Overstreet	e3738c6909	six locks: Improve six_lock_count six_lock_count now counts up whether a write lock held, and this patch now also correctly counts six_lock->intent_lock_recurse. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:39 -04:00
Kent Overstreet	2e27f6567b	bcachefs: Kill nodes_intent_locked Previously, we used two different bit arrays for tracking held btree node locks. This patch switches to an array of two bit integers, which will let us track, in a future patch, when we hold a write lock. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:39 -04:00
Kent Overstreet	d4263e5638	bcachefs: Better use of locking helpers Held btree locks are tracked in btree_path->nodes_locked and btree_path->nodes_intent_locked. Upcoming patches are going to change the representation in struct btree_path, so this patch switches to proper helpers instead of direct access to these fields. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:39 -04:00
Kent Overstreet	8e5696698d	bcachefs: Reorganize btree_locking.[ch] Tidy things up a bit before doing more work in this file. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:39 -04:00
Kent Overstreet	cd5afabea1	bcachefs: btree_locking.c Start to centralize some of the locking code in a new file; more locking code will be moving here in the future. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:39 -04:00
Kent Overstreet	02afcb8c26	bcachefs: Fix adding a device with a label Device labels are represented as pointers in the member info section: we need to get and then set the label for it to be kept correctly. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:39 -04:00
Kent Overstreet	12043cf151	bcachefs: fsck: Another transaction restart handling fix Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:39 -04:00
Kent Overstreet	42590b53fe	bcachefs: bch2_btree_delete_range_trans() now returns -BCH_ERR_transaction_restart_nested The new convention is that functions that handle transaction restarts within an existing transaction context should return -BCH_ERR_transaction_restart_nested when they did so, since they invalidated the outer transaction context. This also means bch2_btree_delete_range_trans() is changed to only call bch2_trans_begin() after a transaction restart, not on every loop iteration. This is to fix a bug in fsck, in check_inode() when we truncate an inode with BCH_INODE_I_SIZE_DIRTY set. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:38 -04:00
Kent Overstreet	efd0d03816	bcachefs: Minor transaction restart handling fix - fsck_inode_rm() wasn't returning BCH_ERR_transaction_restart_nested - change bch2_trans_verify_not_restarted() to call panic() - we don't want these errors to be missed Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:38 -04:00
Kent Overstreet	23dfb3a2f7	bcachefs: Fix bch2_btree_iter_peek_slot() error path iter->k needs to be consistent with iter->pos - required for bch2_btree_iter_(rewind\|advance) to work correctly. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:38 -04:00
Kent Overstreet	8192f8a586	bcachefs: Another should_be_locked fixup When returning a key from the key cache, in BTREE_ITER_WITH_KEY_CACHE mode, we don't want to set should_be_locked on iter->path; we're not returning a key from that path, so we donn't need to, and also since we traversed the key cache iterator before setting should_be_locked on that path it might be unlocked (if we unlocked, bch2_trans_relock() won't have relocked it). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:38 -04:00
Kent Overstreet	d0b50524f1	bcachefs: bch2_bkey_packed_to_binary_text() For debugging the eytzinger search tree code, and low level bkey packing code, it can be helpful to see things in binary: this patch improves our helpers for doing so. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:38 -04:00
Kent Overstreet	f0d2e9f2e5	bcachefs: Add assertions for unexpected transaction restarts Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:38 -04:00
Kent Overstreet	223b560e02	bcachefs: btree_path_down() optimization We should be calling btree_node_mem_ptr_set() before path_level_init(), since we already touched the key that btree_node_mem_ptr_set() will modify and path_level_init() will be doing the lookup in the child btree node we're recursing to. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:38 -04:00
Kent Overstreet	bbf4288401	bcachefs: Always rebuild aux search trees when node boundaries change Topology repair may change btree node min/max keys: when it does so, we need to always rebuild eytzinger search trees because nodes directly depend on those values. This fixes a bug found by the 'kill_btree_node' test, where we'd pop an assertion in bch2_bset_search_linear(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:38 -04:00
Kent Overstreet	15bc0948e7	bcachefs: Add an overflow check in set_bkey_val_u64s() For now this is just a BUG_ON() - we may want to change this to return an error in the future. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:38 -04:00
Olexa Bilaniuk	efa8a7014d	bcachefs: remove dead whiteout_u64s argument. Signed-off-by: Olexa Bilaniuk <obilaniu@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:38 -04:00
Kent Overstreet	9375fbc200	bcachefs: Debugfs cleanup This improves flush_buf() so that it always returns nonzero when we're done reading and ready to return to userspace, and so that it returns the value we want to return to userspace (number of bytes read, if there wasn't an error). In the future we'll be better abstracting this mechanism and pulling it out of bcachefs, and using it to replace seq_file. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:38 -04:00
Kent Overstreet	c59d66b51b	bcachefs: Fix bch2_fs_check_snapshots() We were iterating starting at BCACHEFS_ROOT_INO, but snapshots start at POS_MIN - meaning this code was never getting run. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Reported-by: Olexa Bilaniuk <obilaniu@gmail.com>	2023-10-22 17:09:38 -04:00
Kent Overstreet	c497df8b85	bcachefs: Increment restart count in bch2_trans_begin() Instead of counting transaction restarts, count when the transaction is restarted: if bch2_trans_begin() was called when the transaction wasn't restarted we need to ensure restart_count is still incremented. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:38 -04:00
Kent Overstreet	45b033fa1a	bcachefs: Fix assertion in bch2_btree_key_cache_drop() Turns out this assertion was something we could legitimately hit - add a comment describing what's going on, and handle it. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:38 -04:00
Kent Overstreet	ff7dc3651d	bcachefs: Print last line in debugfs/btree_transaction_stats We need to turn the flush_buf() thing into a proper API, to replace seq_file. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:38 -04:00
Kent Overstreet	5c0bb66ae3	bcachefs: Track the maximum btree_paths ever allocated by each transaction We need a way to check if the machinery for handling btree_paths with in a transaction is behaving reasonably, as it often has not been - we've had bugs with transaction path overflows caused by duplicate paths and plenty of other things. This patch tracks, per transaction fn, the most btree paths ever allocated by that transaction and makes it available in debugfs. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:38 -04:00
Kent Overstreet	4aba7d4569	bcachefs: Rename lock_held_stats -> btree_transaction_stats Going to be adding more things to this in the next patch. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:38 -04:00
Kent Overstreet	11c1a62f3b	bcachefs: Switch bch2_btree_delete_range() to bch2_trans_run() This fixes an assertion about unexpected transaction restarts - bch2_delete_range_trans() handles transaction restarts. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:38 -04:00
Kent Overstreet	7c812ab786	bcachefs: Fix btree_path->uptodate inconsistency This fixes an assertion in bch2_btree_path_peek_slot(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:38 -04:00
Kent Overstreet	a300261ad1	bcachefs: Fix duplicate paths left by bch2_path_put() bch2_path_put() is supposed to drop paths that aren't needed on transaction restart, or to hold locks that we're supposed to keep until transaction commit: but it was failing to free paths in some cases that it should have, leading to transaction path overflows with lots of duplicate paths. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:38 -04:00
Kent Overstreet	6fae65c112	bcachefs: Kill BTREE_ITER_CACHED_(NOFILL\|NOCREATE) These were used more prior to getting rid of the in-memory bucket arrays - they don't serve much purpose anymore, and deleting them lets us write better assertions. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:38 -04:00
Kent Overstreet	9f96568c0a	bcachefs: Tracepoint improvements Our types are exported to the tracepoint code, so it's not necessary to break things out individually when passing them to tracepoints - we can also call other functions from TP_fast_assign(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:38 -04:00
Kent Overstreet	c7be3cb546	bcachefs: "Snapshot deletion did not run correctly" should be a fsck err This was noticed when a test hit this error and didn't fail, because fsck wasn't returning that it fixed errors. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:37 -04:00
Kent Overstreet	fa3ae3ca4e	bcachefs: six_lock_counts() is now in six.c Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:37 -04:00
Kent Overstreet	315c9ba6da	bcachefs: BTREE_ITER_NO_NODE -> BCH_ERR codes Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:37 -04:00
Kent Overstreet	fd211bc71c	bcachefs: Don't set should_be_locked on paths that aren't locked It doesn't make any sense to set should_be_locked on btree_paths that aren't locked, and is often a bug - this patch adds assertions and fixes some of those bugs. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:37 -04:00
Kent Overstreet	31301dd469	bcachefs: Fix missing error handling in bch2_subvolume_delete() This fixes an assertion when the transaction has been unexpectedly restarted. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:37 -04:00
Kent Overstreet	15f11c1aa8	bcachefs: Improve an error message Update an error message to use bch2_err_str(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:37 -04:00
Kent Overstreet	49e401fa55	bcachefs: Tracepoint improvements - use strlcpy(), not strncpy() - add tracepoints for btree_path alloc and free - give the tracepoint for key cache upgrade fail a proper name - add a tracepoint for btree_node_upgrade_fail Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:37 -04:00
Kent Overstreet	17047fbced	bcachefs: Fix incorrectly freeing btree_path in alloc path Clearing path->preserve means the path will be dropping in bch2_trans_begin() - but on transaction restart, we're likely to need that path again. This fixes a livelock in the allocation path. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:37 -04:00
Kent Overstreet	86b7445193	bcachefs: Fix bch2_btree_trans_to_text() bch2_btree_trans_to_text() is used to print btree_transactions owned by other threads; thus, it needs to be particularly careful. This fixes a null ptr deref caused by racing with the owning thread changing path->l[].b. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:37 -04:00

... 2 3 4 5 6 ...

85975 Commits