247 Commits

Author SHA1 Message Date
Kent Overstreet
994ba47543 bcachefs: New btree helpers
This introduces some new conveniences, to help cut down on boilerplate:

 - bch2_trans_kmalloc_nomemzero() - performance optimiation
 - bch2_bkey_make_mut()
 - bch2_bkey_get_mut()
 - bch2_bkey_get_mut_typed()
 - bch2_bkey_alloc()

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:48 -04:00
Kent Overstreet
8852501fe5 bcachefs: Improve fs_usage_apply_warn() message
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:45 -04:00
Kent Overstreet
df6a24f81a bcachefs: Make error messages more uniform
Use __func__ in error messages that refer to function name, and do so
more uniformly - suggested by checkpatch.pl

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:45 -04:00
Kent Overstreet
3e3e02e6bc bcachefs: Assorted checkpatch fixes
checkpatch.pl gives lots of warnings that we don't want - suggested
ignore list:

 ASSIGN_IN_IF
 UNSPECIFIED_INT	- bcachefs coding style prefers single token type names
 NEW_TYPEDEFS		- typedefs are occasionally good
 FUNCTION_ARGUMENTS	- we prefer to look at functions in .c files
			  (hopefully with docbook documentation), not .h
			  file prototypes
 MULTISTATEMENT_MACRO_USE_DO_WHILE
			- we have _many_ x-macros and other macros where
			  we can't do this

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:44 -04:00
Kent Overstreet
ed80c5699a bcachefs: Optimize bch2_dev_usage_read()
- add bch2_dev_usage_read_fast(), which doesn't return by value -
   bch_dev_usage is big enough that we don't want the silent memcpy
 - tweak the allocation path to only call bch2_dev_usage_read() once per
   bucket allocated

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:44 -04:00
Kent Overstreet
5b3243cb52 bcachefs: Fix cached data accounting
Negating without casting to a signed integer means the value wasn't
getting sign extended properly - oops.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:43 -04:00
Kent Overstreet
6c22eb7085 bcachefs: Fix "multiple types of data in same bucket" with ec
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:42 -04:00
Kent Overstreet
098ef98d5b bcachefs: Add private error codes for ENOSPC
Continuing the saga of introducing private dedicated error codes for
each error path, this patch converts ENOSPC to error codes that are
subtypes of ENOSPC. We've recently had a test failure where we got
-ENOSPC where we shouldn't have, and didn't have enough information to
tell where it came from, so this patch will solve that problem.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:40 -04:00
Kent Overstreet
dadecd02c4 bcachefs: bch2_trans_run()
This adds a new helper, bch2_trans_run(), that runs a function with a
btree_transaction context but without handling transaction restarts.
We're adding checks for nested transaction restart handling: when an
inner transaction handles a transaction restart it will still have to
return it to the outer transaction, or else assertions will be popped in
the outer transaction.

But some places don't need restart handling at the outer scope, so this
helper does what they need.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:36 -04:00
Kent Overstreet
f501ad2b81 bcachefs: bch2_mark_alloc(): Do wakeups after updating usage
We have an obvious wake up race if we do the wakeup _before_ updating
the counters the thing doing the waiting is reading.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:36 -04:00
Kent Overstreet
e68914ca84 bcachefs: Rename __bch2_trans_do() -> commit_do()
Better/more descriptive naming, and prep for adding
nested_lockrestart_do() and nested_commit_do().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:35 -04:00
Kent Overstreet
401ec4db63 bcachefs: Printbuf rework
This converts bcachefs to the modern printbuf interface/implementation,
synced with the version to be submitted upstream.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:33 -04:00
Kent Overstreet
1f93726e63 bcachefs: Tracepoint improvements
Delete some obsolete tracepoints, organize alloc tracepoints better,
make a few tracepoints more consistent.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:32 -04:00
Kent Overstreet
e1b8f5f5ca bcachefs: Plumb btree_id & level to trans_mark
For backpointers, we'll need the full key location - that means btree_id
and btree level. This patch plumbs it through.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:32 -04:00
Kent Overstreet
75c8d0305a bcachefs: Kill old rebuild_replicas option
This option was useful when the replicas mechism was new and still being
debugged, but hasn't been used in ages - let's delete it.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:31 -04:00
Kent Overstreet
822835ffea bcachefs: Fold bucket_state in to BCH_DATA_TYPES()
Previously, we were missing accounting for buckets in need_gc_gens and
need_discard states. This matters because buckets in those states need
other btree operations done before they can be used, so they can't be
conuted when checking current number of free buckets against the
allocation watermark.

Also, we weren't directly counting free buckets at all. Now, data type 0
== BCH_DATA_free, and free buckets are counted; this means we can get
rid of the separate (poorly defined) count of unavailable buckets.

This is a new on disk format version, with upgrade and fsck required for
the accounting changes.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:30 -04:00
Kent Overstreet
e1effd42a1 bcachefs: More improvements for alloc info checks
- Move checks for whether the device & bucket are valid from the
   .key_invalid method to bch2_check_alloc_key(). This is because
   .key_invalid() is called on keys that may no longer exist (post
   journal replay), which is a problem when removing/resizing devices.

 - We weren't checking the need_discard btree to ensure that every set
   bucket has a corresponding alloc key. This refactors the code for
   checking the freespace btree, so that it now checks both.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:30 -04:00
Kent Overstreet
c6b6d41612 bcachefs: gc mark fn fixes, cleanups
mark_stripe_bucket() was busted; it was using @new unitialized.

Also, clean up all the gc mark functions, and convert them to the same
style.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:30 -04:00
Kent Overstreet
75f02de43f bcachefs: Use crc_is_compressed()
Trivial cleanup.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:30 -04:00
Kent Overstreet
66d9082385 bcachefs: Kill struct bucket_mark
This switches struct bucket to using a lock, instead of cmpxchg. And now
that the protected members no longer need to fit into a u64, we can
expand the sector counts to 32 bits.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:29 -04:00
Kent Overstreet
5735608c14 bcachefs: Kill main in-memory bucket array
All code using the in-memory bucket array, excluding GC, has now been
converted to use the alloc btree directly - so we can finally delete it.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:29 -04:00
Kent Overstreet
5f43f99c6e bcachefs: bch2_dev_usage_update() no longer depends on bucket_mark
This is one of the last steps in getting rid of the main in-memory
bucket array.

This changes bch2_dev_usage_update() to take bkey_alloc_unpacked instead
of bucket_mark, and for the places where we are in fact working with
bucket_mark and don't have bkey_alloc_unpacked, we add a wrapper that
takes bucket_mark and converts to bkey_alloc_unpacked.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:29 -04:00
Kent Overstreet
caece7fe3f bcachefs: New bucket invalidate path
In the old allocator code, preparing an existing empty bucket was part
of the same code path that invalidated buckets containing cached data.
In the new allocator code this is no longer the case: the main allocator
path finds empty buckets (via the new freespace btree), and can't
allocate buckets that contain cached data.

We now need a separate code path to invalidate buckets containing cached
data when we're low on empty buckets, which this patch implements. When
the number of free buckets decreases that triggers the new invalidate
path to run, which uses the LRU btree to pick cached data buckets to
invalidate until we're above our watermark.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:29 -04:00
Kent Overstreet
59cc38b8d4 bcachefs: New discard implementation
In the old allocator code, buckets would be discarded just prior to
being used - this made sense in bcache where we were discarding buckets
just after invalidating the cached data they contain, but in a
filesystem where we typically have more free space we want to be
discarding buckets when they become empty.

This patch implements the new behaviour - it checks the need_discard
btree for buckets awaiting discards, and then clears the appropriate
bit in the alloc btree, which moves the buckets to the freespace btree.

Additionally, discards are now enabled by default.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:29 -04:00
Kent Overstreet
f25d8215f4 bcachefs: Kill allocator threads & freelists
Now that we have new persistent data structures for the allocator, this
patch converts the allocator to use them.

Now, foreground bucket allocation uses the freespace btree to find
buckets to allocate, instead of popping buckets off the freelist.

The background allocator threads are no longer needed and are deleted,
as well as the allocator freelists. Now we only need background tasks
for invalidating buckets containing cached data (when we are low on
empty buckets), and for issuing discards.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:29 -04:00
Kent Overstreet
c6b2826cd1 bcachefs: Freespace, need_discard btrees
This adds two new btrees for the upcoming allocator rewrite: an extents
btree of free buckets, and a btree for buckets awaiting discards.

We also add a new trigger for alloc keys to keep the new btrees up to
date, and a compatibility path to initialize them on existing
filesystems.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:29 -04:00
Kent Overstreet
3d48a7f85f bcachefs: KEY_TYPE_alloc_v4
This introduces a new alloc key which doesn't use varints. Soon we'll be
adding backpointers and storing them in alloc keys, which means our
pack/unpack workflow for alloc keys won't really work - we'll need to be
mutating alloc keys in place.

Instead of bch2_alloc_unpack(), we now have bch2_alloc_to_v4() that
converts older types of alloc keys to v4 if needed.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:29 -04:00
Kent Overstreet
3e1547116f bcachefs: x-macroize alloc_reserve enum
This makes an array of strings available, like our other enums.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:29 -04:00
Kent Overstreet
78668fe0bb bcachefs: Move deletion of refcount=0 indirect extents to their triggers
For backpointers, we need to switch the order triggers are run in: we
need to run triggers for deletions/overwrites before triggers for
inserts.

To avoid breaking the reflink triggers, this patch moves deleting of
indirect extents with refcount=0 to their triggers, instead of doing it
when we update those keys.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:29 -04:00
Kent Overstreet
880e2275f9 bcachefs: Move trigger fns to bkey_ops
This replaces the switch statements in bch2_mark_key(),
bch2_trans_mark_key() with new bkey methods - prep work for the next
patch, which fixes BTREE_TRIGGER_WANTS_OLD_AND_NEW.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:28 -04:00
Kent Overstreet
2158fe463b bcachefs: bch2_trans_inconsistent()
Add a new error macro that also dumps transaction updates in addition to
doing an emergency shutdown - when a transaction update discovers or is
causing a fs inconsistency, it's helpful to see what updates it was
doing.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:27 -04:00
Kent Overstreet
fa8e94faee bcachefs: Heap allocate printbufs
This patch changes printbufs dynamically allocate and reallocate a
buffer as needed. Stack usage has become a bit of a problem, and a major
cause of that has been static size string buffers on the stack.

The most involved part of this refactoring is that printbufs must now be
exited with printbuf_exit().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:25 -04:00
Kent Overstreet
3598c56eb9 bcachefs: Consolidate trigger code a bit
Upcoming patches are doing more work on the triggers code, this patch
just moves code around.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:25 -04:00
Kent Overstreet
ae94c78fb1 bcachefs: bch2_trans_mark_key() now takes a bkey_i *
We're now coming up with triggers that modify the update being done. A
bkey_s_c is const - bkey_i is the correct type to be using here.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:25 -04:00
Kent Overstreet
b0551285e1 bcachefs: Improve reflink repair code
When a reflink pointer points to a missing indirect extent, we replace
it with an error key. Instead of replacing the entire reflink pointer
with an error key, this patch replaces only the missing range with an
error key.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:25 -04:00
Kent Overstreet
78c8fe20be bcachefs: Normal update/commit path now works before going RW
This improves __bch2_trans_commit - early in the recovery process, when
we're running btree_gc and before we want to go RW, it now uses
bch2_journal_key_insert() to add the update to the list of updates for
journal replay to do, instead of btree_gc having to use separate
interfaces depending on whether we're running at bringup or, later,
runtime.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:25 -04:00
Kent Overstreet
2232fa397c bcachefs: Only allocate buckets_nouse when requested
It's only needed by the migrate tool - this patch adds an option to
enable allocating it.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:24 -04:00
Kent Overstreet
2e63e18066 bcachefs: Stash a copy of key being overwritten in btree_insert_entry
We currently need to call bch2_btree_path_peek_slot() multiple times in
the transaction commit path - and some of those need to be updated to
also check the keys from journal replay, too. Let's consolidate this and
stash the key being overwritten in btree_insert_entry.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:23 -04:00
Kent Overstreet
80bf2f3454 bcachefs: Fix freeing in bch2_dev_buckets_resize()
We were double-freeing old_buckets and not freeing old_buckets_gens:
also, the code was supposed to free buckets, not old_buckets;
old_buckets is only needed because we have to use rcu_assign_pointer()
instead of swap(), and won't be set if we hit the error path.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:23 -04:00
Kent Overstreet
0678cbe2cb bcachefs: Ignore cached data when calculating fragmentation
Previously, bucket fragmentation was considered to be bucket size -
total amount of live data, both dirty and cached.

This meant that if a bucket was full but only a small amount of data in
it was dirty - the rest cached, we'd get stuck: copygc wouldn't move the
dirty data out of the bucket and the allocator wouldn't be able to
invalidate and drop the cached data.

This changes fragmentation to exclude cached data, so that copygc will
evacuate these buckets and copygc/the allocator will always be able to
make forward progress.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:22 -04:00
Kent Overstreet
3763cb9566 bcachefs: Don't use in-memory bucket array for alloc updates
More prep work for getting rid of the in-memory bucket array: now that
we have BTREE_ITER_WITH_JOURNAL, the allocator code can do ntree lookups
before journal replay is finished, and there's no longer any need for it
to get allocation information from the in-memory bucket array.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:22 -04:00
Kent Overstreet
21aec962df bcachefs: New data structure for buckets waiting on journal commit
Implement a hash table, using cuckoo hashing, for empty buckets that are
waiting on a journal commit before they can be reused.

This replaces the journal_seq field of bucket_mark, and is part of
eventually getting rid of the in memory bucket array.

We may need to make bch2_bucket_needs_journal_commit() lockless, pending
profiling and testing.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:22 -04:00
Kent Overstreet
f443fa66c9 bcachefs: Also print out in-memory gen on stale dirty pointer
We're trying to track down a bug that shows itself as newly-created
extents having stale dirty pointers - possibly due to the in memory gen
and the btree gen being inconsistent. This patch changes the error
message to also print out the in memory bucket gen when this happens.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:22 -04:00
Kent Overstreet
f0f41a6d74 bcachefs: Add error messages for memory allocation failures
This adds some missing diagnostics from rare but annoying to debug
runtime allocation failure paths.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:20 -04:00
Kent Overstreet
e3ad29379e bcachefs: Optimize bucket reuse
If the btree updates pointing to a bucket were never flushed by the
journal before the bucket became empty again, we can reuse the bucket
without a journal flush.

This tweaks the tracking of journal sequence numbers in alloc keys to
implement this optimization: now, we only update the journal sequence
number in alloc keys on transitions to and from empty. When a bucket
becomes empty, we check if we can tell the journal not to flush entries
starting from when the bucket was used.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:20 -04:00
Kent Overstreet
13f914ecb9 bcachefs: Kill bch2_ec_mem_alloc()
bch2_ec_mem_alloc() was only used by GC, and there's no real need to
preallocate the stripes radix tree since we can cope fine with memory
allocation failure when we use the radix tree. This deletes a fair bit
of code, and it's also needed for the upcoming patch because
bch2_btree_iter_peek_prev() won't be working before journal replay
completes (and using it was incorrect previously, as well).

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:20 -04:00
Kent Overstreet
36f035e908 bcachefs: Fix allocator + journal interaction
The allocator needs to wait until the last update touching a bucket has
been commited before writing to it again. However, the code was checking
against the last dirty journal sequence number, not the last flushed
journal sequence number.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:20 -04:00
Kent Overstreet
a786087744 bcachefs: New in-memory array for bucket gens
The main in-memory bucket array is going away, but we'll still need to
keep bucket generations in memory, at least for now - ptr_stale() needs
to be an efficient operation.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:20 -04:00
Kent Overstreet
47ac34ec98 bcachefs: Separate out gc_bucket()
Since the main in memory bucket array is going away, we don't want to be
calling bucket() or __bucket() when what we want is the GC in-memory
bucket.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:20 -04:00
Kent Overstreet
e75b2d4c1c bcachefs: bch2_journal_key_insert() no longer transfers ownership
bch2_journal_key_insert() used to assume that the key passed to it was
allocated with kmalloc(), and on success took ownership. This patch
deletes that behaviour, making it more similar to
bch2_trans_update()/bch2_trans_commit().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:19 -04:00