linux

iv/linux

Author	SHA1	Message	Date
Kent Overstreet	82fdc1dc98	bcachefs: bump max_active on btree_interior_update_worker WQ_UNBOUND with max_active 1 means ordered workqueue, but we don't actually need or want ordered semantics - and probably want a higher concurrency limit anyways. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-10 15:34:09 -04:00
Kent Overstreet	ba89083e9f	bcachefs: Fix journal replay with unreadable btree roots When a btree root is unreadable, we still might be able to get some data back by replaying what's in the journal. Previously though, we got confused when journal replay would attempt to replay a key for a level that didn't exist. This adds bch2_btree_increase_depth(), so that journal replay can handle this. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-10 15:18:13 -04:00
Kent Overstreet	4e07447503	bcachefs: Clamp replicas_required to replicas This prevents going emergency read only when the user has specified replicas_required > replicas. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-02-13 20:33:38 -05:00
Kent Overstreet	ec4edd7b9d	bcachefs: Prep work for variable size btree node buffers bcachefs btree nodes are big - typically 256k - and btree roots are pinned in memory. As we're now up to 18 btrees, we now have significant memory overhead in mostly empty btree roots. And in the future we're going to start enforcing that certain btree node boundaries exist, to solve lock contention issues - analagous to XFS's AGIs. Thus, we need to start allocating smaller btree node buffers when we can. This patch changes code that refers to the filesystem constant c->opts.btree_node_size to refer to the btree node buffer size - btree_buf_bytes() - where appropriate. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-21 13:27:10 -05:00
Kent Overstreet	f0431c5f47	bcachefs: Combine .trans_trigger, .atomic_trigger Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-05 23:24:20 -05:00
Kent Overstreet	717296c34c	bcachefs: trans_mark now takes bkey_s Prep work for disk space accounting rewrite: we're going to want to use a single callback for both of our current triggers, so we need to change them to have the same type signature first. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-05 23:24:19 -05:00
Kent Overstreet	ff70ad2c8d	bcachefs: Fix interior update path btree_path uses Since the btree_paths array is now about to become growable, we have to be careful not to refer to paths by pointer across contexts where they may be reallocated. This fixes the remaining btree_interior_update() paths - split and merge. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:44 -05:00
Kent Overstreet	d7e14035a4	bcachefs: get_unlocked_mut_path() -> btree_path_idx_t Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:44 -05:00
Kent Overstreet	b0b6737822	bcachefs: trans_for_each_path_with_node() no longer uses path->idx Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:43 -05:00
Kent Overstreet	ccb7b08fbb	bcachefs: trans_for_each_path() no longer uses path->idx path->idx is now a code smell: we should be using path_idx_t, since it's stable across btree path reallocation. This is also a bit faster, using the same loop counter vs. fetching path->idx from each path we iterate over. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:43 -05:00
Kent Overstreet	07f383c71f	bcachefs: btree_iter -> btree_path_idx_t Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:43 -05:00
Kent Overstreet	96ed47d130	bcachefs: bch2_btree_path_traverse() -> btree_path_idx_t Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:43 -05:00
Kent Overstreet	f6363acaa6	bcachefs: bch2_btree_path_make_mut() -> btree_path_idx_t Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:43 -05:00
Kent Overstreet	74e600c19a	bcachefs; bch2_path_put() -> btree_path_idx_t Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:43 -05:00
Kent Overstreet	255ebbbf75	bcachefs: bch2_path_get() -> btree_path_idx_t Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:43 -05:00
Kent Overstreet	1a2a9f9f53	bcachefs: for_each_keylist_key() declares loop iter Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:43 -05:00
Kent Overstreet	a7dc10ce68	bcachefs: Make sure allocation failure errors are logged The previous patch fixed a bug in allocation path error handling, and it would've been noticed sooner had it been logged properly. Generally speaking, errors that shouldn't happen in normal operation and are being returned up the stack should be logged: the write path was already logging IO errors, but non IO errors were missed. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:41 -05:00
Kent Overstreet	cf904c8d96	bcachefs: bch_err_(fn\|msg) check if should print Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:41 -05:00
Kent Overstreet	f33600057f	bcachefs: bch2_trans_node_add no longer uses trans_for_each_path() In the future we'll be making trans->paths resizable and potentially having _many_ more paths (for fsck); we need to start fixing algorithms that walk each path in a transaction where possible. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:41 -05:00
Kent Overstreet	24de63dacb	bcachefs: Improve trans->extra_journal_entries Instead of using a darray, we now allocate journal entries for the transaction commit path with our normal bump allocator - with an inlined fastpath, and using btree_transaction_stats to remember how much to initially allocate so as to avoid transaction restarts. This is prep work for converting write buffer updates to use this mechanism. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:41 -05:00
Kent Overstreet	a564c9fad5	bcachefs: Include btree_trans in more tracepoints This gives us more context information - e.g. which codepath is invoking btree node reads. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:40 -05:00
Kent Overstreet	3c471b6588	bcachefs: convert bch_fs_flags to x-macro Now we can print out filesystem flags in sysfs, useful for debugging various "what's my filesystem doing" issues. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:38 -05:00
Kent Overstreet	cb52d23e77	bcachefs: Rename BTREE_INSERT flags BTREE_INSERT flags are actually transaction commit flags - rename them for clarity. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:37 -05:00
Kent Overstreet	aa62aabbc7	bcachefs: Kill dead BTREE_INSERT flags BTREE_INSERT_NOWAIT and BTREE_INSERT_GC_LOCK_HELD are no longer used, and can be deleted. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:37 -05:00
Kent Overstreet	3eedfe1af9	bcachefs: Journal pins must always have a flush_fn flush_fn is how we identify journal pins in debugfs - this is a debugging aid. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:37 -05:00
Kent Overstreet	7ba1f6ec97	bcachefs; guard against overflow in btree node split Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-12-19 16:18:16 -05:00
Kent Overstreet	0fa3b97767	bcachefs: btree_node_u64s_with_format() takes nr keys Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-12-19 16:18:13 -05:00
Kent Overstreet	87b0d8d3d0	bcachefs: Fix a journal deadlock in replay Recently, journal pre-reservations were removed. They were for reserving space ahead of time in the journal for operations that are required for journal reclaim, e.g. btree key cache flushing and interior node btree updates. Instead we have watermarks - only operations for journal reclaim are allowed when the journal is low on space, and in general we're quite good about doing operations in the order that will free up space in the journal quickest when we're low on space. If we're doing a journal reclaim operation out of order, we usually do it in nonblocking mode if it's not freeing up space at the end of the journal. There's an exceptino though - interior btree node update operations have to be BCH_WATERMARK_reclaim - once they've been started, and they can't be nonblocking. Generally this is fine because they'll only be a very small fraction of transaction commits - but there's an exception, which is during journal replay. Journal replay does many btree operations, but doesn't need to commit them to the journal since they're already in the journal. So killing off of pre-reservation, plus another change to make journal replay more efficient by initially doing the replay in sorted btree order, made it possible for the interior update operations replay generates to fill and deadlock the journal. Fix this by introducing a new check on journal space at the _start_ of an interior update operation. This causes us to block if necessary in exactly the same way as we used to when interior updates took a journal pre-reservaiton, but without all the expensive accounting pre-reservations required. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-12-04 16:04:55 -05:00
Kent Overstreet	2111f39459	bcachefs: Fix race between btree writes and metadata drop btree writes update the btree node key after every write, in order to update sectors_written, and they also might need to drop pointers if one of the writes failed in a replicated btree node. But the btree node might also have had a pointer dropped while the write was in flight, by bch2_dev_metadata_drop(), and thus there was a bug where the btree node write would ovewrite the btree node's key with what it had at the start of the write. Fix this by dropping pointers not currently in the btree node key. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-11-28 22:58:22 -05:00
Kent Overstreet	5510a4af52	bcachefs: Fix split_race livelock bch2_btree_update_start() calculates which nodes are going to have to be split/rewritten, so that we know how many nodes to reserve and how deep in the tree we have to take locks. But btree node merges require inserting two keys into the parent node, not just splits. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-11-28 17:18:24 -05:00
Kent Overstreet	d4e3b928ab	closures: CLOSURE_CALLBACK() to fix type punning Control flow integrity is now checking that type signatures match on indirect function calls. That breaks closures, which embed a work_struct in a closure in such a way that a closure_fn may also be used as a workqueue fn by the underlying closure code. So we have to change closure fns to take a work_struct as their argument - but that results in a loss of clarity, as closure fns have different semantics from normal workqueue functions (they run owning a ref on the closure, which must be released with continue_at() or closure_return()). Thus, this patc introduces CLOSURE_CALLBACK() and closure_type() macros as suggested by Kees, to smooth things over a bit. Suggested-by: Kees Cook <keescook@chromium.org> Cc: Coly Li <colyli@suse.de> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-11-24 00:29:58 -05:00
Kent Overstreet	006ccc3090	bcachefs: Kill journal pre-reservations This deletes the complicated and somewhat expensive journal pre-reservation machinery in favor of just using journal watermarks: when the journal is more than half full, we run journal reclaim more aggressively, and when the journal is more than 3/4s full we only allow journal reclaim to get new journal reservations. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-11-14 23:44:43 -05:00
Kent Overstreet	769b360049	bcachefs: Don't iterate over journal entries just for btree roots Small performance optimization, and a bit of a code cleanup too. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-11-05 13:12:18 -05:00
Kent Overstreet	6dfa10ab22	bcachefs: Fix build errors with gcc 10 gcc 10 seems to complain about array bounds in situations where gcc 11 does not - curious. This unfortunately requires adding some casts for now; we may investigate getting rid of our __u64 _data[] VLA in a future patch so that our start[0] members can be VLAs. Reported-by: John Stoffel <john@stoffel.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-11-04 14:17:11 -04:00
Kent Overstreet	be9e782df3	bcachefs: Don't downgrade locks on transaction restart We should only be downgrading locks on success - otherwise, our transaction restarts won't be getting the correct locks and we'll livelock. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-11-01 21:11:08 -04:00
Kent Overstreet	b65db750e2	bcachefs: Enumerate fsck errors This patch adds a superblock error counter for every distinct fsck error; this means that when analyzing filesystems out in the wild we'll be able to see what sorts of inconsistencies are being found and repair, and hence what bugs to look for. Errors validating bkeys are not yet considered distinct fsck errors, but this patch adds a new helper, bkey_fsck_err(), in order to add distinct error types for them as well. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-11-01 21:11:08 -04:00
Kent Overstreet	6bd68ec266	bcachefs: Heap allocate btree_trans We're using more stack than we'd like in a number of functions, and btree_trans is the biggest object that we stack allocate. But we have to do a heap allocatation to initialize it anyways, so there's no real downside to heap allocating the entire thing. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:13 -04:00
Kent Overstreet	96dea3d599	bcachefs: Fix W=12 build errors Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:13 -04:00
Colin Ian King	7cb0e6992e	bcachefs: remove redundant initialization of pointer d The pointer d is being initialized with a value that is never read, it is being re-assigned later on when it is used in a for-loop. The initialization is redundant and can be removed. Cleans up clang-scan build warning: fs/bcachefs/buckets.c:1303:25: warning: Value stored to 'd' during its initialization is never read [deadcode.DeadStores] Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:12 -04:00
Kent Overstreet	e46c181af9	bcachefs: Convert more code to bch_err_msg() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:12 -04:00
Kent Overstreet	5b7fbdcd5b	bcachefs: Fix silent enum conversion error This changes mark_btree_node_locked() to take an enum btree_node_locked_type, not a six_lock_type, since BTREE_NODE_UNLOCKED is -1 which may cause problems converting back and forth to six_lock_type if short enums are in use. With this change, we never store BTREE_NODE_UNLOCKED in a six_lock_type enum. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:12 -04:00
Kent Overstreet	93ee2c4b21	bcachefs: Don't open code closure_nr_remaining() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:10 -04:00
Kent Overstreet	401585fe87	bcachefs: btree_journal_iter.c Split out a new file from recovery.c for managing the list of keys we read from the journal: before journal replay finishes the btree iterator code needs to be able to iterate over and return keys from the journal as well, so there's a fair bit of code here. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:10 -04:00
Kent Overstreet	bf5a261c7a	bcachefs: Assorted fixes for clang clang had a few more warnings about enum conversion, and also didn't like the opts.c initializer. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:09 -04:00
Kent Overstreet	01e691e830	bcachefs: Fix a write buffer flush deadlock We're not supposed to block if BTREE_INSERT_JOURNAL_RECLAIM && watermark != BCH_WATERMARK_reclaim. This should really be a separate BTREE_INSERT_NONBLOCK flag - add some comments to that effect, it's not important for this patch. btree write buffer flush depends on this behaviour though - the first loop tries to flush sequentially, which doesn't free up space in the journal optimally. If that can't proceed we bail out and flush in journal order - that won't work if we're blocked instead of returning an error. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:06 -04:00
Kent Overstreet	73bd774d28	bcachefs: Assorted sparse fixes - endianness fixes - mark some things static - fix a few __percpu annotations - fix silent enum conversions Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:06 -04:00
Kent Overstreet	faa6cb6c13	bcachefs: Allow for unknown btree IDs We need to allow filesystems with metadata from newer versions to be mountable and usable by older versions. This patch enables us to roll out new btrees without a new major version number; we can now handle btree roots for unknown btree types. The unknown btree roots will be retained, and fsck (including backpointers) will check them, the same as other btree types. We add a dynamic array for the extra, unknown btree roots, in addition to the fixed size btree root array, and add new helpers for looking up btree roots. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:05 -04:00
Kent Overstreet	f33c58fc46	bcachefs: Kill BTREE_INSERT_USE_RESERVE Now that we have journal watermarks and alloc watermarks unified, BTREE_INSERT_USE_RESERVE is redundant and can be deleted. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:05 -04:00
Kent Overstreet	65db60490a	bcachefs: Fix a null ptr deref in bch2_fs_alloc() error path This fixes a null ptr deref in bch2_free_pending_node_rewrites() when the list head wasn't initialized. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:05 -04:00
Kent Overstreet	ec14fc6010	bcachefs: Kill JOURNAL_WATERMARK This unifies JOURNAL_WATERMARK with BCH_WATERMARK; we're working towards specifying watermarks once in the transaction commit path. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:05 -04:00

1 2 3 4 5 ...

291 Commits