58 Commits

Author SHA1 Message Date
Kent Overstreet
2dac0eae78 bcachefs: Iterator debug code improvements
More aggressively checking iterator invariants, and fixing the resulting
bugs. Also greatly simplifying iter_next() and iter_next_slot() - they
were hyper optimized before, but the optimizations were getting too
brittle.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:36 -04:00
Kent Overstreet
3f58a19763 bcachefs: Journal pin cleanups
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:36 -04:00
Kent Overstreet
00aad62aaf bcachefs: Fix incorrect initialization of btree_node_old_extent_overwrite()
b->level and b->btree_id weren't set when the code was checking
btree_node_is_extents()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:35 -04:00
Kent Overstreet
ac7c51b218 bcachefs: Seralize btree_update operations at btree_update_nodes_written()
Prep work for journalling updates to interior nodes - enforcing ordering
will greatly simplify those changes.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:35 -04:00
Kent Overstreet
548b3d209f bcachefs: btree_ptr_v2
Add a new btree ptr type which contains the sequence number (random 64
bit cookie, actually) for that btree node - this lets us verify that
when we read in a btree node it really is the btree node we wanted.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:35 -04:00
Kent Overstreet
237e80483a bcachefs: introduce b->hash_val
This is partly prep work for introducing bch_btree_ptr_v2, but it'll
also be a bit of a performance boost by moving the full key out of the
hot part of struct btree.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:35 -04:00
Kent Overstreet
5525f632dc bcachefs: Change btree split threshold to be in u64s
This fixes a bug with very small btree nodes where splitting would end
up with one of the new nodes empty.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:34 -04:00
Kent Overstreet
9626aeb167 bcachefs: Rework iter->pos handling
- Rework some of the helper comparison functions for consistency

- Currently trying to refactor all the logic that's different for
extents in the btree iterator code. The main difference is that for non
extents we search for a key greater than or equal to the search key,
while for extents we search for a key strictly greater than the search
key (iter->pos).

So that logic is now handled by btree_iter_search_key(), which computes
the real search key based on iter->pos and whether or not we're
searching for a key >= or > iter->pos.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:34 -04:00
Kent Overstreet
2d594dfb53 bcachefs: Split out btree_trigger_flags
The trigger flags really belong with individual btree_insert_entries,
not the transaction commit flags - this splits out those  flags and
unifies them with the BCH_BUCKET_MARK flags. Todo - split out
btree_trigger.c from buckets.c

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:33 -04:00
Kent Overstreet
bcd6f3e06f bcachefs: Use KEY_TYPE_deleted whitouts for extents
Previously, partial overwrites of existing extents were handled
implicitly by the btree code; when reading in a btree node, we'd do a
mergesort of the different bsets and detect and fix partially
overlapping extents during that mergesort.

That approach won't work with snapshots: this changes extents to work
like regular keys as far as the btree code is concerned, where a 0 size
KEY_TYPE_deleted whiteout will completely overwrite an existing extent.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:33 -04:00
Kent Overstreet
27b3e52388 bcachefs: Add an assertion to track down a heisenbug
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:33 -04:00
Kent Overstreet
ad44bdc351 bcachefs: bkey noops
For upcoming inline data extents, we're going to need to be able to
shorten the value of existing bkeys in the btree - and to make that work
we're going to be able to need to pad out the space the value previously
took up with something.

This patch changes the various code that iterates over bkeys to handle
k->u64s == 0 as meaning "skip the next 8 bytes".

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:32 -04:00
Justin Husted
928c839cc9 bcachefs: Initialize btree_node flags field in bch2_btree_root_alloc.
Valgrind data indicated that the flags field was only partially
initialized when written to disk.

Signed-off-by: Justin Husted <sigstop@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:30 -04:00
Kent Overstreet
ea3532cbf7 bcachefs: Fix a subtle race in the btree split path
We have to free the old (in memory) btree node _before_ unlocking the
new nodes - else, some other thread with a read lock on the old node
could see stale data after another thread has already updated the new
node.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:29 -04:00
Kent Overstreet
2cbe5cfe27 bcachefs: Rework calling convention for marking overwrites
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:25 -04:00
Kent Overstreet
6e738539cd bcachefs: Improve key marking interface
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:22 -04:00
Kent Overstreet
20bceecb31 bcachefs: More work to avoid transaction restarts
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:22 -04:00
Kent Overstreet
58fbf80834 bcachefs: Delete duplicate code
Also rename for consistency

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:22 -04:00
Kent Overstreet
ed8413fdab bcachefs: improved btree locking tracepoints
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:22 -04:00
Kent Overstreet
b03b81dfd2 bcachefs: Don't pass around may_drop_locks
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:22 -04:00
Kent Overstreet
b7607ce98f bcachefs: Kill remaining bch2_btree_iter_unlock() uses
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:21 -04:00
Kent Overstreet
c43a6ef9a0 bcachefs: btree_bkey_cached_common
This is prep work for the btree key cache: btree iterators will point to
either struct btree, or a new struct bkey_cached.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:21 -04:00
Kent Overstreet
5e82a9a1f4 bcachefs: Write out fs usage consistently
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:21 -04:00
Kent Overstreet
36e916e13b bcachefs: Caller now responsible for calling mark_key for gc
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:19 -04:00
Kent Overstreet
a2b6b0729e bcachefs: add missing bch2_btree_iter_node_drop() call
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:19 -04:00
Kent Overstreet
0f23836771 bcachefs: trans_for_each_iter()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:18 -04:00
Kent Overstreet
dc3b63dc33 bcachefs: Add time stats for btree updates
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:18 -04:00
Kent Overstreet
4d8100daa9 bcachefs: Allocate fs_usage in do_btree_insert_at()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:18 -04:00
Kent Overstreet
8c96cfccf0 bcachefs: fix more locking bugs
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:16 -04:00
Kent Overstreet
2ecc6171a3 bcachefs: Fix double counting when gc is running
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:16 -04:00
Kent Overstreet
39fbc5a49f bcachefs: gc lock no longer needed for disk reservations
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:16 -04:00
Kent Overstreet
42b72e0ba2 bcachefs: journal_replay_early()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:15 -04:00
Kent Overstreet
430735cd1a bcachefs: Persist alloc info on clean shutdown
- Does not persist alloc info for stripes yet
 - Also does not yet include filesystem block/sector counts yet, from
struct fs_usage
 - Not made use of just yet

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:14 -04:00
Kent Overstreet
7ef2a73a58 bcachefs: Fix check for if extent update is allocating
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:14 -04:00
Kent Overstreet
d0cc3defba bcachefs: More allocator startup improvements
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:14 -04:00
Kent Overstreet
9166b41db1 bcachefs: s/usage_lock/mark_lock
better describes what it's for, and we're going to call a new lock
usage_lock

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:13 -04:00
Kent Overstreet
eb8632657f bcachefs: drop bogus percpu_ref_tryget
caller should already be guarding against rw, and checking here breaks
when caller needs to finish updates for going RO

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:12 -04:00
Kent Overstreet
26609b619f bcachefs: Make bkey types globally unique
this lets us get rid of a lot of extra switch statements - in a lot of
places we dispatch on the btree node type, and then the key type, so
this is a nice cleanup across a lot of code.

Also improve the on disk format versioning stuff.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:12 -04:00
Kent Overstreet
eeb83e25bb bcachefs: Hold usage_lock over mark_key and fs_usage_apply
Fixes an inconsistency at the end of gc

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:12 -04:00
Kent Overstreet
ad7ae8d63f bcachefs: Btree locking fix, refactoring
Hit an assertion, probably spurious, indicating an iterator was unlocked
when it shouldn't have been (spurious because it wasn't locked at all
when the caller called btree_insert_at()).

Add a flag, BTREE_ITER_NOUNLOCK, and tighten up the assertions

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:12 -04:00
Kent Overstreet
9ca53b55f7 bcachefs: gc now operates on second set of bucket marks
This means we can now use gc to verify the allocation information -
important for testing persistant alloc info

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:12 -04:00
Kent Overstreet
f1a79365a7 bcachefs: Don't block on journal reservation with btree locks held
Fixes a deadlock between the allocator thread, when it first starts up,
and journal replay

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:11 -04:00
Kent Overstreet
cd575ddf57 bcachefs: Erasure coding
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:11 -04:00
Kent Overstreet
319f9ac38e bcachefs: revamp to_text methods
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:11 -04:00
Kent Overstreet
47799326bc bcachefs: more key marking refactoring
prep work for erasure coding

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:10 -04:00
Kent Overstreet
103e212785 bcachefs: replicas: prep work for stripes
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:10 -04:00
Kent Overstreet
ef337c54c6 bcachefs: Allocation code refactoring
bch2_alloc_sectors_start() was a nightmare to work with - it's got some
tricky stuff to do, since it wants to use the buckets the writepoint
already has, unless they're not in the target it wants to write to,
unless it can't allocate from any other devices in which case it will
use those buckets if it has to - et cetera.

This restructures the code to start with a new empty list of open
buckets we're going to use for the new allocation, pulling buckets from
the write point's list as we decide that we really are going to use
them - making the code somewhat more functional and drastically easier
to understand.

Also fixes a bug where we could end up waiting on c->freelist_wait
(because allocating from one device failed) but return success from
bch2_bucket_alloc(), because allocating from a different device
succeeded.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:10 -04:00
Kent Overstreet
7b3f84ea7d bcachefs: Split out alloc_background.c
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:10 -04:00
Kent Overstreet
34b8e55276 bcachefs: Fix a deadlock
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:09 -04:00
Kent Overstreet
a00fd8c535 bcachefs: Comparison function cleanups
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:09 -04:00