linux

iv/linux

Author	SHA1	Message	Date
Kent Overstreet	9fcdd23b6e	bcachefs: Add a comment for BTREE_INSERT_NOJOURNAL usage BTREE_INSERT_NOJOURNAL is primarily used for a performance optimization related to inode updates and fsync - document it. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-11-04 22:19:13 -04:00
Kent Overstreet	d3c7727bb9	bcachefs: rebalance_work btree is not a snapshots btree rebalance_work entries may refer to entries in the extents btree, which is a snapshots btree, or they may also refer to entries in the reflink btree, which is not. Hence rebalance_work keys may use the snapshot field but it's not required to be nonzero - add a new btree flag to reflect this. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-11-04 22:19:13 -04:00
Kent Overstreet	c4accde498	bcachefs: Ensure srcu lock is not held too long The SRCU read lock that btree_trans takes exists to make it safe for bch2_trans_relock() to deref pointers to btree nodes/key cache items we don't have locked, but as a side effect it blocks reclaim from freeing those items. Thus, it's important to not hold it for too long: we need to differentiate between bch2_trans_unlock() calls that will be only for a short duration, and ones that will be for an unbounded duration. This introduces bch2_trans_unlock_long(), to be used mainly by the data move paths. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-11-04 14:17:11 -04:00
Kent Overstreet	50a38ca1ba	bcachefs: Fix btree_node_type enum More forwards compatibility fixups: having BKEY_TYPE_btree at the end of the enum conflicts with unnkown btree IDs, this shifts BKEY_TYPE_btree to slot 0 and fixes things up accordingly. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-31 12:18:37 -04:00
Kent Overstreet	6bd68ec266	bcachefs: Heap allocate btree_trans We're using more stack than we'd like in a number of functions, and btree_trans is the biggest object that we stack allocate. But we have to do a heap allocatation to initialize it anyways, so there's no real downside to heap allocating the entire thing. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:13 -04:00
Kent Overstreet	96dea3d599	bcachefs: Fix W=12 build errors Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:13 -04:00
Kent Overstreet	c872afa224	bcachefs: Fix bch2_propagate_key_to_snapshot_leaves() When we handle a transaction restart in a nested context, we need to return -BCH_ERR_transaction_restart_nested because we invalidated the outer context's iterators and locks. bch2_propagate_key_to_snapshot_leaves() wasn't doing this, this patch fixes it to use trans_was_restarted(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:12 -04:00
Kent Overstreet	8079aab085	bcachefs: Split up btree_update_leaf.c We now have btree_trans_commit.c btree_update.c Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:10 -04:00
Kent Overstreet	f26c67f4a7	bcachefs: Snapshot depth, skiplist fields This extents KEY_TYPE_snapshot to include some new fields: - depth, to indicate depth of this particular node from the root - skip[3], skiplist entries for quickly walking back up to the root These are to improve bch2_snapshot_is_ancestor(), making it O(ln(n)) instead of O(n) in the snapshot tree depth. Skiplist nodes are picked at random from the set of ancestor nodes, not some fixed fraction. This introduces bcachefs_metadata_version 1.1, snapshot_skiplists. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:06 -04:00
Kent Overstreet	73bd774d28	bcachefs: Assorted sparse fixes - endianness fixes - mark some things static - fix a few __percpu annotations - fix silent enum conversions Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:06 -04:00
Kent Overstreet	9473cff989	bcachefs: Fix more lockdep splats in debug.c Similar to previous fixes, we can't incur page faults while holding btree locks. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:04 -04:00
Kent Overstreet	d95dd378c2	bcachefs: allocate_dropping_locks() Add two new helpers for allocating memory with btree locks held: The idea is to first try the allocation with GFP_NOWAIT\|__GFP_NOWARN, then if that fails - unlock, retry with GFP_KERNEL, and then call trans_relock(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:03 -04:00
Kent Overstreet	b5fd75669a	bcachefs: drop_locks_do() Add a new helper for the common pattern of: - trans_unlock() - do something - trans_relock() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:03 -04:00
Kent Overstreet	e47a390aa5	bcachefs: Convert -ENOENT to private error codes As with previous conversions, replace -ENOENT uses with more informative private error codes. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:03 -04:00
Kent Overstreet	f154c3eb42	bcachefs: trans_for_each_path_safe() bch2_btree_trans_to_text() is used on btree_trans objects that are owned by different threads - when printing out deadlock cycles - so we need a safe version of trans_for_each_path(), else we race with seeing a btree_path that was just allocated and not fully initialized: Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:02 -04:00
Kent Overstreet	32913f49f5	six locks: Seq now only incremented on unlock Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:02 -04:00
Kent Overstreet	1fb4fe6317	six locks: Kill six_lock_state union As suggested by Linus, this drops the six_lock_state union in favor of raw bitmasks. On the one hand, bitfields give more type-level structure to the code. However, a significant amount of the code was working with six_lock_state as a u64/atomic64_t, and the conversions from the bitfields to the u64 were deemed a bit too out-there. More significantly, because bitfield order is poorly defined (#ifdef __LITTLE_ENDIAN_BITFIELD can be used, but is gross), incrementing the sequence number would overflow into the rest of the bitfield if the compiler didn't put the sequence number at the high end of the word. The new code is a bit saner when we're on an architecture without real atomic64_t support - all accesses to lock->state now go through atomic64_*() operations. On architectures with real atomic64_t support, we additionally use atomic bit ops for setting/clearing individual bits. Text size: 7467 bytes -> 4649 bytes - compilers still suck at bitfields. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:02 -04:00
Kent Overstreet	d67a16df9c	bcachefs: Move bch2_bkey_make_mut() to btree_update.h It's for doing updates - this is where it belongs, and next pathes will be changing these helpers to use items from btree_update.h. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:01 -04:00
Kent Overstreet	bcb79a51cb	bcachefs: bch2_bkey_get_iter() helpers Introduce new helpers for a common pattern: bch2_trans_iter_init(); bch2_btree_iter_peek_slot(); - bch2_bkey_get_iter_type() returns -ENOENT if it doesn't find a key of the correct type - bch2_bkey_get_val_typed() copies the val out of the btree to a (typically stack allocated) variable; it handles the case where the value in the btree is smaller than the current version of the type, zeroing out the remainder. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:00 -04:00
Kent Overstreet	ab158fce47	bcachefs: Converting to typed bkeys is now allowed for err, null ptrs Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:00 -04:00
Kent Overstreet	511b629aca	bcachefs: bch2_btree_iter_peek_node_and_restart() Minor refactoring for the Rust interface. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:56 -04:00
Kent Overstreet	0f2ea6550f	bcachefs: bch2_btree_iter_peek_and_restart_outlined() Needed for interfacing with Rust - bindgen can't handle inline functions, alas. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:55 -04:00
Kent Overstreet	5e2d8be8bd	bcachefs: Split trans->last_begin_ip and trans->last_restarted_ip These are two different things - this improves our debug assert messages. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:53 -04:00
Kent Overstreet	73d86dfd88	bcachefs: Fix erasure coding locking This adds a new helper, bch2_trans_mutex_lock(), for locking a mutex - dropping and retaking btree locks as needed. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:53 -04:00
Kent Overstreet	c72f687a1f	bcachefs: Use for_each_btree_key_upto() more consistently It's important that in BTREE_ITER_FILTER_SNAPSHOTS mode we always use peek_upto() and provide an end for the interval we're searching for - otherwise, when we hit the end of the inode the next inode be in a different subvolume and not have any keys in the current snapshot, and we'd iterate over arbitrarily many keys before returning one. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:50 -04:00
Kent Overstreet	12344c7cb9	bcachefs: bch2_trans_in_restart_error() This replaces various BUG_ON() assertions with panics that tell us where the restart was done and the restart type. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:50 -04:00
Kent Overstreet	4e3d18991a	bcachefs: Inline bch2_btree_path_traverse() fastpath Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:50 -04:00
Kent Overstreet	313816363a	bcachefs: bch2_trans_relock_notrace() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:49 -04:00
Kent Overstreet	ee2c6ea776	bcachefs: btree_iter->ip_allocated In debug mode, we now track where btree iterators and paths are initialized/allocated - helpful in tracking down btree path overflows. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:49 -04:00
Kent Overstreet	994ba47543	bcachefs: New btree helpers This introduces some new conveniences, to help cut down on boilerplate: - bch2_trans_kmalloc_nomemzero() - performance optimiation - bch2_bkey_make_mut() - bch2_bkey_get_mut() - bch2_bkey_get_mut_typed() - bch2_bkey_alloc() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:48 -04:00
Kent Overstreet	e88a75ebe8	bcachefs: New bpos_cmp(), bkey_cmp() replacements This patch introduces - bpos_eq() - bpos_lt() - bpos_le() - bpos_gt() - bpos_ge() and equivalent replacements for bkey_cmp(). Looking at the generated assembly these could probably be improved further, but we already see a significant code size improvement with this patch. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:47 -04:00
Kent Overstreet	c96f108b05	bcachefs: Optimize bch2_trans_iter_init() When flags & btree_id are constants, we can constant fold the entire calculation of the actual iterator flags - and the whole thing becomes small enough to inline. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:46 -04:00
Kent Overstreet	3bce138373	bcachefs: Fix for_each_btree_key2() Previously, when we exited from the loop body with a break statement _ret wouldn't have been assigned to yet, and we could spuriously return a transaction restart error. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:46 -04:00
Kent Overstreet	0f35e0860a	bcachefs: Fix return code from btree_path_traverse_one() trans->restarted is a positive error code, not the usual negative Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:46 -04:00
Kent Overstreet	b2d1d56b1d	bcachefs: Fixes for building in userspace - Marking a non-static function as inline doesn't actually work and is now causing problems - drop that - Introduce BCACHEFS_LOG_PREFIX for when we want to prefix log messages with bcachefs (filesystem name) - Userspace doesn't have real percpu variables (maybe we can get this fixed someday), put an #ifdef around bch2_disk_reservation_add() fastpath Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:46 -04:00
Kent Overstreet	307e3c1319	bcachefs: Optimize bch2_trans_init() Now we store the transaction's fn idx in a local variable, instead of redoing the lookup every time we call bch2_trans_init(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:44 -04:00
Kent Overstreet	13bc41a715	bcachefs: bch2_trans_locked() Useful debugging function. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:42 -04:00
Kent Overstreet	14d8f26ad0	bcachefs: Inline bch2_trans_kmalloc() fast path Small performance optimization. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:42 -04:00
Kent Overstreet	33bd5d0686	bcachefs: Deadlock cycle detector We've outgrown our own deadlock avoidance strategy. The btree iterator API provides an interface where the user doesn't need to concern themselves with lock ordering - different btree iterators can be traversed in any order. Without special care, this will lead to deadlocks. Our previous strategy was to define a lock ordering internally, and whenever we attempt to take a lock and trylock() fails, we'd check if the current btree transaction is holding any locks that cause a lock ordering violation. If so, we'd issue a transaction restart, and then bch2_trans_begin() would re-traverse all previously used iterators, but in the correct order. That approach had some issues, though. - Sometimes we'd issue transaction restarts unnecessarily, when no deadlock would have actually occured. Lock ordering restarts have become our primary cause of transaction restarts, on some workloads totally 20% of actual transaction commits. - To avoid deadlock or livelock, we'd often have to take intent locks when we only wanted a read lock: with the lock ordering approach, it is actually illegal to hold _any_ read lock while blocking on an intent lock, and this has been causing us unnecessary lock contention. - It was getting fragile - the various lock ordering rules are not trivial, and we'd been seeing occasional livelock issues related to this machinery. So, since bcachefs is already a relational database masquerading as a filesystem, we're stealing the next traditional database technique and switching to a cycle detector for avoiding deadlocks. When we block taking a btree lock, after adding ourself to the waitlist but before sleeping, we do a DFS of btree transactions waiting on other btree transactions, starting with the current transaction and walking our held locks, and transactions blocking on our held locks. If we find a cycle, we emit a transaction restart. Occasionally (e.g. the btree split path) we can not allow the lock() operation to fail, so if necessary we'll tell another transaction that it has to fail. Result: trans_restart_would_deadlock events are reduced by a factor of 10 to 100, and we'll be able to delete a whole bunch of grotty, fragile code. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:41 -04:00
Kent Overstreet	e4215d0fec	bcachefs: All held locks must be in a btree path With the new deadlock cycle detector, it's critical that all held locks be marked in a btree_path, because that's what the cycle detector traverses - any locks that aren't correctly marked will cause deadlocks. This changes the btree_path to allocate some btree_paths for the new nodes, since until the final update is done we otherwise don't have a path referencing them. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:40 -04:00
Kent Overstreet	674cfc2624	bcachefs: Add persistent counters for all tracepoints Also, do some reorganizing/renaming, convert atomic counters in bch_fs to persistent counters, and add a few missing counters. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:39 -04:00
Kent Overstreet	b1cdc398ae	bcachefs: Make more btree_paths available - Don't decrease BTREE_ITER_MAX when building with CONFIG_LOCKDEP anymore. The lockdep table sizes are configurable now, we don't need this anymore. - btree_trans_too_many_iters() is less conservative now. Previously it was causing a transaction restart if we had used more than BTREE_ITER_MAX / 2 paths, change this to BTREE_ITER_MAX - 8. This helps with excessive transaction restarts/livelocks in the bucket allocator path. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:39 -04:00
Kent Overstreet	616928c30f	bcachefs: Track maximum transaction memory This patch - tracks maximum bch2_trans_kmalloc() memory used in btree_transaction_stats - makes it available in debugfs - switches bch2_trans_init() to using that for the amount of memory to preallocate, instead of the parameter passed in This drastically reduces transaction restarts, and means we no longer need to track this in the source code. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:39 -04:00
Kent Overstreet	cd5afabea1	bcachefs: btree_locking.c Start to centralize some of the locking code in a new file; more locking code will be moving here in the future. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:39 -04:00
Kent Overstreet	f0d2e9f2e5	bcachefs: Add assertions for unexpected transaction restarts Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:38 -04:00
Kent Overstreet	c497df8b85	bcachefs: Increment restart count in bch2_trans_begin() Instead of counting transaction restarts, count when the transaction is restarted: if bch2_trans_begin() was called when the transaction wasn't restarted we need to ensure restart_count is still incremented. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:38 -04:00
Kent Overstreet	5c0bb66ae3	bcachefs: Track the maximum btree_paths ever allocated by each transaction We need a way to check if the machinery for handling btree_paths with in a transaction is behaving reasonably, as it often has not been - we've had bugs with transaction path overflows caused by duplicate paths and plenty of other things. This patch tracks, per transaction fn, the most btree paths ever allocated by that transaction and makes it available in debugfs. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:38 -04:00
Kent Overstreet	9f96568c0a	bcachefs: Tracepoint improvements Our types are exported to the tracepoint code, so it's not necessary to break things out individually when passing them to tracepoints - we can also call other functions from TP_fast_assign(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:38 -04:00
Kent Overstreet	17047fbced	bcachefs: Fix incorrectly freeing btree_path in alloc path Clearing path->preserve means the path will be dropping in bch2_trans_begin() - but on transaction restart, we're likely to need that path again. This fixes a livelock in the allocation path. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:37 -04:00
Kent Overstreet	4f84b7e30b	bcachefs: for_each_btree_key_reverse() This adds a new macro, like for_each_btree_key2(), but for iterating in reverse order. Also, change for_each_btree_key2() to properly check the return value of bch2_btree_iter_advance(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:37 -04:00

1 2 3 4

171 Commits