linux

iv/linux

Author	SHA1	Message	Date
Kent Overstreet	4e92cbb642	bcachefs: More debug code improvements Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:48 -04:00
Kent Overstreet	1c74cec10c	bcachefs: Add more debug checks tracking down a bug where we see a btree node pointer in the wrong node Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:47 -04:00
Kent Overstreet	0b5c9f5940	bcachefs: Set preallocated transaction mem to avoid restarts this will reduce transaction restarts, from observation of tracepoints. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:47 -04:00
Kent Overstreet	6a747c4683	bcachefs: Add accounting for dirty btree nodes/keys This lets us improve journal reclaim, so that it now tries to make sure no more than 3/4s of the btree node cache and btree key cache are dirty - ensuring the shrinkers can free memory. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:46 -04:00
Kent Overstreet	811d2bcd85	bcachefs: Drop typechecking from bkey_cmp_packed() This only did anything in two places, and those can just be replaced wiht bkey_cmp_left_packed()). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:46 -04:00
Kent Overstreet	7807e14384	bcachefs: Convert various code to printbuf printbufs know how big the buffer is that was allocated, so we can get rid of the random PAGE_SIZEs all over the place. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:43 -04:00
Kent Overstreet	760992aac8	bcachefs: Ensure we wake up threads locking node when reusing it Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:43 -04:00
Kent Overstreet	4fe7efa177	bcachefs: Fix an error path We were missing a 'goto retry' and continuing on with an error pointer. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:43 -04:00
Kent Overstreet	a2b5313a39	bcachefs: Fix a faulty assertion Now that updates to interior nodes are journalled, we shouldn't be checking topology of interior nodes until we've finished replaying updates to that node. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:42 -04:00
Kent Overstreet	fff899b1d9	bcachefs: Mark btree nodes as needing rewrite when not all replicas are RW This fixes a bug where recovery fails when one of the devices is read only. Also - consolidate the "must rewrite this node to insert it" behind a new btree node flag. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:42 -04:00
Kent Overstreet	937f503605	bcachefs: Use btree reserve when appropriate Whenever we're doing an update that has pointers, that generally means we need to do the update in order to release open bucket references - so we should be using the btree open bucket reserve. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:42 -04:00
Kent Overstreet	2ca88e5ad9	bcachefs: Btree key cache This introduces a new kind of btree iterator, cached iterators, which point to keys cached in a hash table. The cache also acts as a write cache - in the update path, we journal the update but defer updating the btree until the cached entry is flushed by journal reclaim. Cache coherency is for now up to the users to handle, which isn't ideal but should be good enough for now. These new iterators will be used for updating inodes and alloc info (the alloc and stripes btrees). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:41 -04:00
Kent Overstreet	bd2bb273a0	bcachefs: Don't deadlock when btree node reuse changes lock ordering Btree node lock ordering is based on the logical key. However, 'struct btree' may be reused for a different btree node under memory pressure. This patch uses the new six lock callback to check if a btree node is no longer the node we wanted to lock before blocking. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:41 -04:00
Kent Overstreet	4efe71a646	bcachefs: Always give out journal pre-res if we already have one This is better than skipping the journal pre-reservation if we already have one - we should still acount for the journal reservation we're going to have to get. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:41 -04:00
Kent Overstreet	966885ee40	bcachefs: Fix a linked list bug Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:41 -04:00
Kent Overstreet	40ca39b564	bcachefs: btree_update_nodes_written() requires alloc reserve Also, in the btree_update_start() path, if we already have a journal pre-reservation we don't want to take another - that's a deadlock. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:40 -04:00
Kent Overstreet	00b8ccf707	bcachefs: Interior btree updates are now fully transactional We now update the alloc info (bucket sector counts) atomically with journalling the update to the interior btree nodes, and we also set new btree roots atomically with the journalled part of the btree update. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:40 -04:00
Kent Overstreet	c823c3390b	bcachefs: Factor out bch2_fs_btree_interior_update_init() Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:40 -04:00
Kent Overstreet	bc970cecd8	bcachefs: Fix two more deadlocks Deadlock on shutdown: btree_update_nodes_written() unblocks btree nodes from being written; after doing so, it has to check if they were marked as needing to be written and if so kick off those writes - if that doesn't happen, we'll never release journal pins and shutdown will get stuck when flushing the journal. There was an error path where this didn't happen, because in the error path we don't actually want those btree nodes write to happen; however, we still have to kick off the write path so the journal pins get released. The btree write path checks if we're in a journal error state and doesn't do the actual write if we are. Also - there was another deadlock because btree_update_nodes_written() was taking the btree update off of the unwritten_list too soon - before getting a journal reservation, which could fail and have to be retried. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:39 -04:00
Kent Overstreet	5b6d505a77	bcachefs: Fix another deadlock in btree_update_nodes_written() We also can't be blocking on btree node write locks while holding btree_interior_update_lock. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:39 -04:00
Kent Overstreet	297604c923	bcachefs: Add a few tracepoints Transaction restart tracing should probably be overhaulled at some point. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:38 -04:00
Kent Overstreet	58fb3e519a	bcachefs: Fix another deadlock in the btree interior update path Can't take read locks on btree nodes while holding btree_interior_update_lock. Also, fix a bug where we were leaking journal prereservations. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:38 -04:00
Kent Overstreet	0f9dda478f	bcachefs: Fix a deadlock on starting an interior btree update Not legal to block on a journal prereservation with btree locks held. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:38 -04:00
Kent Overstreet	1e3b1f9a22	bcachefs: Fix a debug mode assertion Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:38 -04:00
Kent Overstreet	8707ab0df2	bcachefs: Fix another error path locking bug btree_update_nodes_written() was leaking a btree node lock on failure to get a journal reservation. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:38 -04:00
Kent Overstreet	501e1bda3e	bcachefs: Fix journalling of interior node updates We weren't journalling updates done while splitting/compacting nodes - oops. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:38 -04:00
Kent Overstreet	a0e491c099	bcachefs: Don't allocate memory while holding journal reservation This fixes a lockdep splat - allocating memory can call bch2_clear_page_bits() which takes mark_lock. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:37 -04:00
Kent Overstreet	39fb2983c5	bcachefs: Kill bkey_type_successor Previously, BTREE_ID_INODES was special - inodes were indexed by the inode field, which meant the offset field of struct bpos wasn't used, which led to special cases in e.g. the btree iterator code. Now, inodes in the inodes btree are indexed by the offset field. Also: prevously min_key was special for extents btrees, min_key for extents would equal max_key for the previous node. Now, min_key = bkey_successor() of the previous node, same as non extent btrees. This means we can completely get rid of btree_type_sucessor/predecessor. Also make some improvements to the metadata IO validate/compat code. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:37 -04:00
Kent Overstreet	56a40fbc4e	bcachefs: Various fixes for interior update path The locking was wrong, and we could get a use after free in the error path where we weren't taking the entrie being freed off the unwritten list. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:37 -04:00
Kent Overstreet	6357d6071f	bcachefs: Journal updates to interior nodes Previously, the btree has always been self contained and internally consistent on disk without anything from the journal - the journal just contained pointers to the btree roots. However, this meant that btree node split or compact operations - i.e. anything that changes btree node topology and involves updates to interior nodes - would require that interior btree node to be written immediately, which means emitting a btree node write that's mostly empty (using 4k of space on disk if the filesystemm blocksize is 4k to only write perhaps ~100 bytes of new keys). More importantly, this meant most btree node writes had to be FUA, and consumer drives have a history of slow and/or buggy FUA support - other filesystes have been bit by this. This patch changes the interior btree update path to journal updates to interior nodes, after the writes for the new btree nodes have completed. Best of all, it turns out to simplify the interior node update path somewhat. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:37 -04:00
Kent Overstreet	e62d65f2fb	bcachefs: trans_commit() path can now insert to interior nodes This will be needed for the upcoming patches to journal updates to interior btree nodes. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:37 -04:00
Kent Overstreet	2dac0eae78	bcachefs: Iterator debug code improvements More aggressively checking iterator invariants, and fixing the resulting bugs. Also greatly simplifying iter_next() and iter_next_slot() - they were hyper optimized before, but the optimizations were getting too brittle. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:36 -04:00
Kent Overstreet	3f58a19763	bcachefs: Journal pin cleanups Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:36 -04:00
Kent Overstreet	00aad62aaf	bcachefs: Fix incorrect initialization of btree_node_old_extent_overwrite() b->level and b->btree_id weren't set when the code was checking btree_node_is_extents() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:35 -04:00
Kent Overstreet	ac7c51b218	bcachefs: Seralize btree_update operations at btree_update_nodes_written() Prep work for journalling updates to interior nodes - enforcing ordering will greatly simplify those changes. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:35 -04:00
Kent Overstreet	548b3d209f	bcachefs: btree_ptr_v2 Add a new btree ptr type which contains the sequence number (random 64 bit cookie, actually) for that btree node - this lets us verify that when we read in a btree node it really is the btree node we wanted. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:35 -04:00
Kent Overstreet	237e80483a	bcachefs: introduce b->hash_val This is partly prep work for introducing bch_btree_ptr_v2, but it'll also be a bit of a performance boost by moving the full key out of the hot part of struct btree. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:35 -04:00
Kent Overstreet	5525f632dc	bcachefs: Change btree split threshold to be in u64s This fixes a bug with very small btree nodes where splitting would end up with one of the new nodes empty. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:34 -04:00
Kent Overstreet	9626aeb167	bcachefs: Rework iter->pos handling - Rework some of the helper comparison functions for consistency - Currently trying to refactor all the logic that's different for extents in the btree iterator code. The main difference is that for non extents we search for a key greater than or equal to the search key, while for extents we search for a key strictly greater than the search key (iter->pos). So that logic is now handled by btree_iter_search_key(), which computes the real search key based on iter->pos and whether or not we're searching for a key >= or > iter->pos. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:34 -04:00
Kent Overstreet	2d594dfb53	bcachefs: Split out btree_trigger_flags The trigger flags really belong with individual btree_insert_entries, not the transaction commit flags - this splits out those flags and unifies them with the BCH_BUCKET_MARK flags. Todo - split out btree_trigger.c from buckets.c Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:33 -04:00
Kent Overstreet	bcd6f3e06f	bcachefs: Use KEY_TYPE_deleted whitouts for extents Previously, partial overwrites of existing extents were handled implicitly by the btree code; when reading in a btree node, we'd do a mergesort of the different bsets and detect and fix partially overlapping extents during that mergesort. That approach won't work with snapshots: this changes extents to work like regular keys as far as the btree code is concerned, where a 0 size KEY_TYPE_deleted whiteout will completely overwrite an existing extent. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:33 -04:00
Kent Overstreet	27b3e52388	bcachefs: Add an assertion to track down a heisenbug Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:33 -04:00
Kent Overstreet	ad44bdc351	bcachefs: bkey noops For upcoming inline data extents, we're going to need to be able to shorten the value of existing bkeys in the btree - and to make that work we're going to be able to need to pad out the space the value previously took up with something. This patch changes the various code that iterates over bkeys to handle k->u64s == 0 as meaning "skip the next 8 bytes". Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:32 -04:00
Justin Husted	928c839cc9	bcachefs: Initialize btree_node flags field in bch2_btree_root_alloc. Valgrind data indicated that the flags field was only partially initialized when written to disk. Signed-off-by: Justin Husted <sigstop@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:30 -04:00
Kent Overstreet	ea3532cbf7	bcachefs: Fix a subtle race in the btree split path We have to free the old (in memory) btree node _before_ unlocking the new nodes - else, some other thread with a read lock on the old node could see stale data after another thread has already updated the new node. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:29 -04:00
Kent Overstreet	2cbe5cfe27	bcachefs: Rework calling convention for marking overwrites Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:25 -04:00
Kent Overstreet	6e738539cd	bcachefs: Improve key marking interface Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:22 -04:00
Kent Overstreet	20bceecb31	bcachefs: More work to avoid transaction restarts Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:22 -04:00
Kent Overstreet	58fbf80834	bcachefs: Delete duplicate code Also rename for consistency Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:22 -04:00
Kent Overstreet	ed8413fdab	bcachefs: improved btree locking tracepoints Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:22 -04:00

1 2

89 Commits