linux

iv/linux

Author	SHA1	Message	Date
Kent Overstreet	326568f18c	bcachefs: Convert bch2_gc_done() for_each_btree_key2() This converts bch2_gc_stripes_done() and bch2_gc_reflink_done() to the new for_each_btree_key_commit() macro. The new for_each_btree_key2() and for_each_btree_key_commit() macros handles transaction retries, allowing us to avoid nested transactions - which we want to avoid since they're tricky to do completely correctly and upcoming assertions are going to be checking for that. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:36 -04:00
Kent Overstreet	eace11a730	bcachefs: Convert more fsck code to for_each_btree_key2() The new for_each_btree_key2() macro handles transaction retries, allowing us to avoid nested transactions - which we want to avoid since they're tricky to do completely correctly and upcoming assertions are going to be checking for that. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:36 -04:00
Kent Overstreet	1329c7ce56	bcachefs: Convert more quota code to for_each_btree_key2() The new for_each_btree_key2() macro handles transaction retries, allowing us to avoid nested transactions - which we want to avoid since they're tricky to do completely correctly and upcoming assertions are going to be checking for that. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:36 -04:00
Kent Overstreet	1615505cdf	bcachefs: Convert bch2_check_lrus() to for_each_btree_key_commit() The new for_each_btree_key2() macro handles transaction retries, allowing us to avoid nested transactions - which we want to avoid since they're tricky to do completely correctly and upcoming assertions are going to be checking for that. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:36 -04:00
Kent Overstreet	ca91f40ff7	bcachefs: Convert bch2_dev_freespace_init() to for_each_btree_key_commit() The new for_each_btree_key2() macro handles transaction retries, allowing us to avoid nested transactions - which we want to avoid since they're tricky to do completely correctly and upcoming assertions are going to be checking for that. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:36 -04:00
Kent Overstreet	4910a9506c	bcachefs: Convert bch2_do_discards_work() to for_each_btree_key2() The new for_each_btree_key2() macro handles transaction retries, allowing us to avoid nested transactions - which we want to avoid since they're tricky to do completely correctly and upcoming assertions are going to be checking for that. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:36 -04:00
Kent Overstreet	8ef9831399	bcachefs: Improve bucket_alloc_fail tracepoint We should be printing the number of free buckets, not just the number of available buckets. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:36 -04:00
Kent Overstreet	f501ad2b81	bcachefs: bch2_mark_alloc(): Do wakeups after updating usage We have an obvious wake up race if we do the wakeup _before_ updating the counters the thing doing the waiting is reading. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:36 -04:00
Daniel Hill	c807ca95a6	bcachefs: added lock held time stats We now record the length of time btree locks are held and expose this in debugfs. Enabled via CONFIG_BCACHEFS_LOCK_TIME_STATS. Signed-off-by: Daniel Hill <daniel@gluo.nz> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:35 -04:00
Daniel Hill	25055c690f	bcachefs: bch2_time_stats_to_text now indents properly Printbufs indentation feature doesn't yet work with '\n' and '\t'. So we've replaced all instances of '\n' with prt_newline. Signed-off-by: Daniel Hill <daniel@gluo.nz> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:35 -04:00
Daniel Hill	8bfe14e86a	bcachefs: lock time stats prep work. We need the caller name and a place to store our results, btree_trans provides this. Signed-off-by: Daniel Hill <daniel@gluo.nz> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:35 -04:00
Kent Overstreet	43de721a33	bcachefs: Unlock in bch2_trans_begin() if we've held locks more than 10us We try to ensure we never hold btree locks for too long - bcachefs tries to be soft realtime. This adds a check when restarting a transaction, where a transaction restart is cheap - if we've been holding locks for too long, drop and retake them. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:35 -04:00
Kent Overstreet	a1783320d4	bcachefs: for_each_btree_key2() This introduces two new macros for iterating through the btree, with transaction restart handling - for_each_btree_key2() - for_each_btree_key_commit() Every iteration is now in an implicit transaction, and - as with lockrestart_do() and commit_do() - returning -EINTR will cause the transaction to be restarted, at the same key. This patch converts a bunch of code that was open coding this to these new macros, saving a substantial amount of code. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:35 -04:00
Kent Overstreet	0d06b4eca6	bcachefs: Fix repair for extent past end of inode When we find an extent past an inode's i_size, we need to do the deletion in the inode's snapshot (which will emit a whiteout if necessary); and we also need to note that we now have an a key at that position and snapshot, so that we don't go into an infinite loop. Also, switch to walking inodes in reverse older, oldest snapshot to newest, so that we emit the fewest whiteouts possible. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:35 -04:00
Kent Overstreet	c7a09cb1b1	bcachefs: When fsck finds redundant snapshot keys, trigger snapshots cleanup Fsck now checks for keys in different snapshot IDs that are now redundant due to other snapshots being deleted - it needs to for its own algorithms to not get confused. When it detects this it should re-run the post snapshot deletion cleanup - this patch does that. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:35 -04:00
Kent Overstreet	35f1a5034d	bcachefs: Improve fsck for subvols/snapshots - Bunch of refactoring, and move some code out of bch2_snapshots_start() and into bch2_snapshots_check(), for constency with the rest of fsck - Interior snapshot nodes no longer point to a subvolume; this is so we don't end up with dangling subvol references when deleting or require scanning the full snapshots btree. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:35 -04:00
Kent Overstreet	49124d8a7f	bcachefs: Improve snapshots_seen This makes the snapshots_seen data structure fsck private and improves it; we now also track the equivalence class for each snapshot id we've seen, which means we can detect when snapshot deletion hasn't finished or run correctly (which will otherwise confuse fsck). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:35 -04:00
Kent Overstreet	4ab35c34d5	bcachefs: Fix subvol/snapshot deleting in recovery fsck doesn't want to run while we're cleaning up deleted snapshots - if that work needs to be done, we want it to have finished before fsck runs, otherwise fsck will get confused when it finds multiple keys in the same snapshot ID equivalence class (i.e. the mechanism that snapshot deletion uses for cleaning up redundant keys). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:35 -04:00
Kent Overstreet	e4085b70f2	bcachefs: fsck_inode_rm() shouldn't delete subvols We should never see an inode marked as unlinked that's a subvolume root (or a directory) in fsck, but even if we do it's not correct for fsck to delete the subvolume: subvolumes are owned by dirents, and if we find a dangling subvolume (not marked as unlinked) we want fsck to reattach it. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:35 -04:00
Kent Overstreet	597dee1cd6	bcachefs: Switch data_update path to snapshot_id_list snapshots_seen is becoming private to fsck, and snapshot_id_list is actually what the data update path needs. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:35 -04:00
Kent Overstreet	416cc426c0	bcachefs: Fix snapshot deletion Snapshots being deleted won't in general have a corresponding subvolume: this fixes a spurious fsck error where we'd complain about a snapshot pointing to a missing subvolume - but the subvolume had been deleted, and the snapshot was pending deletion as well. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:35 -04:00
Kent Overstreet	e68914ca84	bcachefs: Rename __bch2_trans_do() -> commit_do() Better/more descriptive naming, and prep for adding nested_lockrestart_do() and nested_commit_do(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:35 -04:00
Kent Overstreet	80b3bf33d3	bcachefs: Silence some fsck errors when reconstructing alloc info There's no need to print fsck errors for errors that are expected, and the user has already opted to repair. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:35 -04:00
Kent Overstreet	1534ebb706	bcachefs: Put some repair messages behind opts->verbose These messages log the updates we're doing in bch2_check_fix_ptrs(), which is useful when debugging but not usually needed. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:35 -04:00
Kent Overstreet	e28307a106	bcachefs: Silence unimportant tracepoints Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:35 -04:00
Kent Overstreet	7c0732b88d	bcachefs: Fix move path when move_stats == NULL This isn't done very often, but it is legitimate Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:35 -04:00
Kent Overstreet	4081ace307	bcachefs: Get ref on c->writes in move.c There's no point reading an extent in order to move it if the write is going to fail because we're shutting down. This patch changes the move path so that moving_io now owns a ref on c->writes - as a bonus, rebalance and copygc will now notice that we're shutting down and exit quicker. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:35 -04:00
Kent Overstreet	0337cc7eee	bcachefs: move.c refactoring - add bch2_moving_ctxt_(init\|exit) - split out __bch2_evacutae_bucket() which takes an existing moving_ctxt, this will be used for improving copygc performance by pipelining across multiple buckets Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:35 -04:00
Daniel Hill	c91996c50a	bcachefs: data jobs, including rebalance wait for copygc. move_ratelimit() now has a bool that specifies whether we want to wait for copygc to finish. When copygc is running, we're probably low on free buckets instead of consuming the remaining buckets, we want to wait for copygc to finish. This should help with performance, and run away bucket fragmentation. Signed-off-by: Daniel Hill <daniel@gluo.nz> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:35 -04:00
Kent Overstreet	7f5c5d20f0	bcachefs: Redo data_update interface This patch significantly cleans up and simplifies the data_update interface. Instead of only being able to specify a single pointer by device to rewrite, we're now able to specify any or all of the pointers in the original extent to be rewrited, as a bitmask. data_cmd is no more: the various pred functions now just return true if the extent should be moved/updated. All the data_update path does is rewrite existing replicas, or add new ones. This fixes a bug where with background compression on replicated filesystems, where rebalance -> data_update would incorrectly drop the wrong old replica, and keep trying to recompress an extent pointer and each time failing to drop the right replica. Oops. Now, the data update path doesn't look at the io options to decide which pointers to keep and which to drop - it only goes off of the data_update_options passed to it. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:34 -04:00
Kent Overstreet	47ab0c5f6a	bcachefs: Fix bch2_check_alloc_key() bch2_check_alloc_key() was failing to check buckets that didn't have alloc keys yet (because they'd never been used) - they still need to be added to the freespace btree. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:34 -04:00
Kent Overstreet	e34da43e33	bcachefs: Improve bch2_check_alloc_info - In check_alloc_key(), previously we were re-initializing iterators for the need_discard and freespace btrees for every alloc key we checked. But this was causing us to redo lookups into the journal keys every time, since those lookups are cached in struct btree_iter. This initializes the iterators in bch2_check_alloc_info and passes them into check_alloc_key(). - Make the looping more consistent/efficient in bch2_check_alloc_info() Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:34 -04:00
Kent Overstreet	22add2ec67	bcachefs: Use BTREE_INSERT_LAZY_RW in bch2_check_alloc_info() This runs before we go rw for journal replay, but after we're allowed to go rw. It might be time to consider killing BTREE_INSERT_LAZY_RW, though. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:34 -04:00
Kent Overstreet	3858536744	bcachefs: Bucket invalidate path improvements - invalidate_one_bucket() now returns 1 when we don't have any buckets on this device to invalidate, ensuring we don't spin - the tracepoint invocation is moved to after the transaction commit, and we now include the number of cached sectors in the tracepoint Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:34 -04:00
Kent Overstreet	962ad1a766	bcachefs: Don't BUG_ON() inode link count underflow This switches that assertion to a bch2_trans_inconsistent() call, as it should be. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:34 -04:00
Kent Overstreet	7a47d0993b	bcachefs: Always descend to leaf nodes it btree_gc If a btree node is unreadable, it's the topology repair that fixes that and it's kicked off by btree_gc, so btree_gc needs to touch every node and very that they can be read. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:34 -04:00
Daniel Hill	58aaa0836b	bcachefs: fix __dev_available(). __dev_available() now calculates available buckets correctly. Previously it would almost always return 0 when we have cached data. Signed-off-by: Daniel Hill <daniel@gluo.nz> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:34 -04:00
Kent Overstreet	2817d45381	bcachefs: Fix assertion in topology repair If we were at the end of the node, when breaking out of the loop we'd pop the assertion on line 446 when cur wasn't NULL. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:34 -04:00
Kent Overstreet	5a3c24714c	bcachefs: Make verbose option settable at runtime -o verbose is very useful, and we're starting to use it more for runtime debug statements - making it possible to enable at runtime is a no brainer. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:34 -04:00
Kent Overstreet	54feff0a7a	bcachefs: Improve "copygc requested to run" error message This improves the "copygc requested to run but no buckets found" to show the device that requires copygc to be run on - we'll definitely need to improve this more. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:34 -04:00
Kent Overstreet	c501fef6de	bcachefs: Pull out data_update.c This is the start of reorganizing the data IO paths. The plan is to also break apart io.c into data_read.c and data_write.c, and migrate_write will be renamed to the data_update path. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:34 -04:00
Kent Overstreet	30f0349d62	bcachefs: Split out dev_buckets_free() Previously, dev_buckets_available() only counted buckets that are eligible to be allocated right now - i.e. buckets that don't have cached data, or need discard, or need gc gens, etc. But most users of this function want to know how many buckets are eligible to be allocated from without moving data around - copygc, allocator striping, which means we should be including cached data buckets etc. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:34 -04:00
Kent Overstreet	8f7f566f57	bcachefs: btree key cache pcpu freedlist Originally, the btree key cache code would always allocate new entries by reusing from the recently-freed list, if that list wasn't empty. But that behaviour was dropped, for lock contention reasons. But it seems that entries stranded on the freed list have been contributing to some of our oom issues, because long running btree transactions will prevent them from being freed. This patch re-adds allocating from the freed list, but it also adds percpu buffers to solve the lock contention issues - and the new percpu freed lists will improve the evict paths, too. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:34 -04:00
Kent Overstreet	7bb61e8c0e	bcachefs: Make IO in flight by copygc/rebalance configurable This adds a new option, move_bytes_in_flight, for configuring the amount of IO in flight by copygc/rebalance - users with many devices in their filesystem will want to increase this. In the future we should be smarter about this, but this is an easy improvement. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:34 -04:00
Kent Overstreet	b5f73fd79f	bcachefs: Check for extents with too many ptrs We have a hardcoded maximum on number of pointers in an extent that's used by some other data structures - notably bch_devs_list - but we weren't actually checking for it. Oops. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:34 -04:00
Kent Overstreet	1c6ff39445	bcachefs: Fix refcount leak in bch2_do_invalidates() If we fail to queue the work item because it's already in process, we need to drop the ref we just took. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:34 -04:00
Kent Overstreet	a3d7afa5c1	bcachefs: Always use percpu_ref_tryget_live() on c->writes If we're trying to get a ref and the refcount has been killed, it means we're doing an emergency shutdown - we always want tryget_live(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:34 -04:00
Kent Overstreet	23189da9eb	bcachefs: Improve checksum error messages We're seeing checksum errors in the bch2_rechecksum_bio() path - give it a better error message to help track this down. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:34 -04:00
Kent Overstreet	50b13beef0	bcachefs: Improve an error message When inserting a key type that's not valid for a given btree, we should print out which btree we were inserting into. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:34 -04:00
Kent Overstreet	2ed6248ab3	bcachefs: Fix assertion in bch2_dev_list_add_dev() We were only allowing 4 devices in a dev_list, not 16. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:34 -04:00

1 2 3 4 5 ...

1217017 Commits