113 Commits

Author SHA1 Message Date
Kent Overstreet
69294246b7 bcachefs: Fix allocator shutdown error message
We return 1 to indicate kthread_should_stop() returned true - we
shouldn't be printing an error.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:13 -04:00
Kent Overstreet
67e0dd8f0d bcachefs: btree_path
This splits btree_iter into two components: btree_iter is now the
externally visible componont, and it points to a btree_path which is now
reference counted.

This means we no longer have to clone iterators up front if they might
be mutated - btree_path can be shared by multiple iterators, and cloned
if an iterator would mutate a shared btree_path. This will help us use
iterators more efficiently, as well as slimming down the main long lived
state in btree_trans, and significantly cleans up the logic for iterator
lifetimes.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:11 -04:00
Kent Overstreet
8b3e9bd65f bcachefs: Always check for transaction restarts
On transaction restart iterators won't be locked anymore - make sure
we're always checking for errors.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:09 -04:00
Kent Overstreet
8d34458781 bcachefs: Add safe versions of varint encode/decode
This adds safe versions of bch2_varint_(encode|decode) that don't read
or write past the end of the buffer, or varint being encoded.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:08 -04:00
Kent Overstreet
2e655e6de2 bcachefs: Add open_buckets to sysfs
This is to help debug a rare shutdown deadlock in the allocator code -
the btree code is leaking open_buckets.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:08 -04:00
Kent Overstreet
bc3f8b25f3 bcachefs: Check for errors from bch2_trans_update()
Upcoming refactoring is going to change bch2_trans_update() to start
returning transaction restarts.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:05 -04:00
Kent Overstreet
01254036a3 bcachefs; Check for allocator thread shutdown
We were missing a kthread_should_stop() check in the loop in
bch2_invalidate_buckets(), very occasionally leading to us getting stuck
while shutting down.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:05 -04:00
Kent Overstreet
3a402c8dab bcachefs: Fix some refcounting bugs
We really need debug mode assertions that ca->ref and ca->io_ref are
used correctly.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:03 -04:00
Kent Overstreet
ac1019d32b bcachefs: Clean up bch2_btree_and_journal_walk()
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:02 -04:00
Kent Overstreet
89baec780f bcachefs: Allocator refactoring
This uses the kthread_wait_freezable() macro to simplify a lot of the
allocator thread code, along with cleaning up bch2_invalidate_bucket2().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:01 -04:00
Kent Overstreet
04903131db bcachefs: Handle errors in bch2_trans_mark_update()
It's not actually the case that iterators are always checked here -
__bch2_trans_commit() checks for that after running triggers.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:01 -04:00
Kent Overstreet
6ad060b0eb bcachefs: Allocator thread doesn't need gc_lock anymore
Even with runtime gc (which currently isn't supported), runtime gc no
longer clears/recalculates the main set of bucket marks - it allocates
and calculates another set, updating the primary at the end.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:01 -04:00
Kent Overstreet
dac1525d9c bcachefs: gc shouldn't care about owned_by_allocator
The owned_by_allocator field is a purely in memory thing, even if/when
we bring back GC at runtime there's no need for it to be recalculating
this field. This is prep work for pulling it out of struct bucket, and
eventually getting rid of the bucket array.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:01 -04:00
Kent Overstreet
d62ab355d7 bcachefs: Fix bch2_trans_mark_dev_sb()
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:00 -04:00
Kent Overstreet
b1bd955ba5 bcachefs: Don't wait for ALLOC_SCAN_BATCH buckets in allocator
It used to be necessary for the allocator thread to batch up
invalidating buckets when possible - but since we added the btree key
cache that hasn't been a concern, and now it's causing the allocator
thread to livelock when the filesystem is nearly full.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:59 -04:00
Kent Overstreet
73590619ec bcachefs: Don't unconditially version_upgrade in initialize
This is mkfs's job. Also, clean up the handling of feature bits some.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:56 -04:00
Kent Overstreet
50dc0f692a bcachefs: Require all btree iterators to be freed
We keep running into occasional bugs with btree transaction iterators
overflowing - this will make those bugs more visible.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:56 -04:00
Kent Overstreet
2436cb9fad bcachefs: Use x-macros for more enums
This patch standardizes all the enums that have associated string tables
(probably more enums should have string tables).

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:55 -04:00
Kent Overstreet
41f8b09edc bcachefs: Rename BTREE_ID enums for consistency with other enums
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:55 -04:00
Kent Overstreet
bae895a5a3 bcachefs: Add allocator thread state to sysfs
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:54 -04:00
Kent Overstreet
51c66fedc0 bcachefs: Rip out copygc pd controller
We have a separate mechanism for ratelimiting copygc now - the pd
controller has only been causing problems.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:54 -04:00
Kent Overstreet
cb66fc5fe4 bcachefs: Fix copygc threshold
Awhile back the meaning of is_available_bucket() and thus also
bch_dev_usage->buckets_unavailable changed to include buckets that are
owned by the allocator - this was so that the stat could be persisted
like other allocation information, and wouldn't have to be regenerated
by walking each bucket at mount time.

This broke copygc, which needs to consider buckets that are reclaimable
and haven't yet been grabbed by the allocator thread and moved onta
freelist. This patch fixes that by adding dev_buckets_reclaimable() for
copygc and the allocator thread, and cleans up some of the callers a bit.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:54 -04:00
Kent Overstreet
1b05778707 bcachefs: Add a cond_seched() to the allocator thread
This is just a band-aid fix for now.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:54 -04:00
Kent Overstreet
59a7405161 bcachefs: Create allocator threads when allocating filesystem
We're seeing failures to mount because of a failure to start the
allocator threads, which currently happens fairly late in the mount
process, after walking all metadata, and kthread_create() fails if
something has tried to kill the mount process, which is probably not
what we want.

This patch avoids this issue by creating, but not starting, the
allocator threads when we preallocate all of our other in memory data
structures.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:53 -04:00
Kent Overstreet
dab9ef0d27 bcachefs: Add error message for some allocation failures
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:53 -04:00
Kent Overstreet
180fb49dea bcachefs: Journal updates to dev usage
This eliminates the need to scan every bucket to regenerate dev_usage at
mount time.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:52 -04:00
Kent Overstreet
2abe542087 bcachefs: Persist 64 bit io clocks
Originally, bcachefs - going back to bcache - stored, for each bucket, a
16 bit counter corresponding to how long it had been since the bucket
was read from. But, this required periodically rescaling counters on
every bucket to avoid wraparound. That wasn't an issue in bcache, where
we'd perodically rewrite the per bucket metadata all at once, but in
bcachefs we're trying to avoid having to walk every single bucket.

This patch switches to persisting 64 bit io clocks, corresponding to the
64 bit bucket timestaps introduced in the previous patch with
KEY_TYPE_alloc_v2.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:52 -04:00
Kent Overstreet
7f4e1d5d0f bcachefs: KEY_TYPE_alloc_v2
This introduces a new version of KEY_TYPE_alloc, which uses the new
varint encoding introduced for inodes. This means we'll eventually be
able to support much larger bucket sizes (for SMR devices), and the
read/write time fields are expanded to 64 bits - which will be used in
the next patch to get rid of the periodic rescaling of those fields.

Also, for buckets that are members of erasure coded stripes, this adds
persistent fields for the index of the stripe they're members of and the
stripe redundancy. This is part of work to get rid of having to scan and
read into memory the alloc and stripes btrees at mount time.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:52 -04:00
Kent Overstreet
4529ae09ce bcachefs: Fix an assertion
If we're invalidating a bucket that has cached data in it, data_type
won't be 0 - oops.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:52 -04:00
Kent Overstreet
bfcf840ddf bcachefs: Mark superblocks transactionally
More work towards getting rid of the in memory struct bucket: this path
adds code for marking superblock and journal buckets via the btree, and
uses it in the device add and journal resize paths.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:52 -04:00
Kent Overstreet
9afc6652d1 bcachefs: Kill bch2_invalidate_bucket()
This patch is working towards eventually getting rid of the in memory
struct bucket, and relying only on the btree representation.

Since bch2_invalidate_bucket() was only used for incrementing gens, not
invalidating cached data, no other counters were being changed as a side
effect - meaning it's safe for the allocator code to increment the
bucket gen directly.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:52 -04:00
Kent Overstreet
72eab8da47 bcachefs: Refactor dev usage
This is to make it more amenable for serialization.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:52 -04:00
Kent Overstreet
4291a3317f bcachefs: bch2_alloc_write() should be writing for all devices
Alloc info isn't stored on a particular device, it makes no sense to
only be writing it out for rw members - this was causing fsck to not fix
alloc info errors, oops.

Also, make sure we write out alloc info in other repair paths.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:50 -04:00
Kent Overstreet
3187aa8d57 bcachefs: Don't use BTREE_INSERT_USE_RESERVE so much
Previously, we were using BTREE_INSERT_RESERVE in a lot of places where
it no longer makes sense.

 - we now have more open_buckets than we used to, and the reserves work
   better, so we shouldn't need to use BTREE_INSERT_RESERVE just because
   we're holding open_buckets pinned anymore.

 - We have the btree key cache for updates to the alloc btree, meaning
   we no longer need the btree reserve to ensure the allocator can make
   forward progress.

This means that we should only need a reserve for btree updates to
ensure that copygc can make forward progress.

Since it's now just for copygc, we can also fold RESERVE_BTREE into
RESERVE_MOVINGGC (the allocator's freelist reserve).

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:50 -04:00
Kent Overstreet
f30dd86012 bcachefs: Don't write bucket IO time lazily
With the btree key cache code, we don't need to update the alloc btree
lazily - and this will mean we can remove the bch2_alloc_write() call in
the shutdown path.

Future work: we really need to expend the bucket IO clocks from 16 to 64
bits, so that we don't have to rescale them.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:50 -04:00
Kent Overstreet
b7a9bbfc1b bcachefs: Move journal reclaim to a kthread
This is to make tracing easier.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:48 -04:00
Kent Overstreet
39283c712e bcachefs: Fix for bad stripe pointers
The allocator usually doesn't increment bucket gens right away on
buckets that it's about to hand out (for reasons that need to be
documented), instead deferring that to whatever extent update first
references that bucket.

But stripe pointers reference buckets without changing bucket sector
counts, meaning we could end up with a pointer in a stripe with a gen
newer than the bucket it points to.

Fix this by adding a transactional trigger for KEY_TYPE_stripe that just
writes out the keys in the alloc btree for the buckets it points to.

Also - consolidate the code that checks pointer validity.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:44 -04:00
Kent Overstreet
289980195f bcachefs: Start/stop io clock hands in read/write paths
This fixes a bug where the clock hands in the journal and superblock
didn't match, because we were still incrementing the read clock hand
while read-only.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:44 -04:00
Kent Overstreet
8d6b6222bf bcachefs: Improvements to writing alloc info
Now that we've got transactional alloc info updates (and have for
awhile), we don't need to write it out on shutdown, and we don't need to
write it out on startup except when GC found errors - this is a big
improvement to mount/unmount performance.

This patch also fixes a few bugs where we weren't writing out alloc
info (on new filesystems, and new devices) and should have been.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:44 -04:00
Kent Overstreet
f3721e12d0 bcachefs: Perf improvements for bch_alloc_read()
On large filesystems reading in the alloc info takes a significant
amount of time. But we don't need to be calling into the fully general
bch2_mark_key() path, just open code what we need in
bch2_alloc_read_fn().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:44 -04:00
Kent Overstreet
f9adbb7d5d bcachefs: Add a cond_resched() to bch2_alloc_write()
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:43 -04:00
Kent Overstreet
74ed7e560b bcachefs: Don't let copygc buckets be stolen by other threads
And assorted other copygc fixes.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:43 -04:00
Kent Overstreet
3d080aa52f bcachefs: Delete unused arguments
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:43 -04:00
Kent Overstreet
e6d1161530 bcachefs: Make copygc thread global
Per device copygc threads don't move data to different devices and they
make fragmentation works - they don't make much sense anymore.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:42 -04:00
Kent Overstreet
89fd25be70 bcachefs: Use x-macros for data types
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:42 -04:00
Kent Overstreet
eff508b459 bcachefs: Add a kthread_should_stop() check to allocator thread
Turns out it's possible during shutdown for the allocator to get stuck
spinning on bch2_invalidate_buckets() without hitting any of the other
checks.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:41 -04:00
Kent Overstreet
7dd1ebfa1e bcachefs: Increase size of btree node reserve
Also tweak the allocator to be more aggressive about keeping it full.
The recent changes to make updates to interior nodes transactional (and
thus generate updates to the alloc btree) all put more stress on the
btree node reserves.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:41 -04:00
Kent Overstreet
5d20ba48f0 bcachefs: Use cached iterators for alloc btree
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:41 -04:00
Kent Overstreet
255adc515a bcachefs: Always increment bucket gen on bucket reuse
Not doing so confuses copygc

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:40 -04:00
Kent Overstreet
a27443bc76 bcachefs: Kill old allocator startup code
It's not needed anymore since we can now write to buckets before
updating the alloc btree.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:40 -04:00