35 Commits

Author SHA1 Message Date
Kent Overstreet
880e2275f9 bcachefs: Move trigger fns to bkey_ops
This replaces the switch statements in bch2_mark_key(),
bch2_trans_mark_key() with new bkey methods - prep work for the next
patch, which fixes BTREE_TRIGGER_WANTS_OLD_AND_NEW.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:28 -04:00
Kent Overstreet
ec061b215d bcachefs: btree_gc no longer uses main in-memory bucket array
This changes the btree_gc code to only use the second bucket array, the
one dedicated to GC. On completion, it compares what's in its in memory
bucket array to the allocation information in the btree and writes it
directly, instead of updating the main in-memory bucket array and
writing that.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:23 -04:00
Kent Overstreet
abe19d458e bcachefs: Refactor open_bucket code
Prep work for adding a hash table of open buckets - instead of embedding
a bch_extent_ptr, we need to refer to the bucket directly so that we're
not calling sector_to_bucket() in the hash table lookup code, which has
an expensive divide.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:20 -04:00
Kent Overstreet
20572300dc bcachefs: Improve alloc_mem_to_key()
This moves some common code into alloc_mem_to_key(), which translates
from the in-memory format for a bucket to the btree key format.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:18 -04:00
Kent Overstreet
fb0e480872 bcachefs: bch2_alloc_write()
This adds a new helper that much like the one we have for inode updates,
that allocates the packed alloc key, packs it and calls
bch2_trans_update.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:18 -04:00
Kent Overstreet
b547d005d5 bcachefs: Erasure coding fixes
When we added the stripe and stripe_redundancy fields to alloc keys, we
neglected to add them to the functions that convert back and forth with
the in-memory types.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:18 -04:00
Kent Overstreet
3e52c22255 bcachefs: Add journal_seq to inode & alloc keys
Add fields to inode & alloc keys that record the journal sequence number
when they were most recently modified.

For alloc keys, this is needed to know what journal sequence number we
have to flush before the bucket can be reused. Currently this is tracked
in memory, but we'll be getting rid of the in memory bucket array.

For inodes, this is needed for fsync when the inode has been evicted
from the vfs cache. Currently we use a bloom filter per outstanding
journal buf - but that mechanism has been broken since we added the
ability to not issue a flush/fua for every journal write.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:16 -04:00
Kent Overstreet
2e655e6de2 bcachefs: Add open_buckets to sysfs
This is to help debug a rare shutdown deadlock in the allocator code -
the btree code is leaking open_buckets.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:08 -04:00
Kent Overstreet
ac1019d32b bcachefs: Clean up bch2_btree_and_journal_walk()
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:02 -04:00
Kent Overstreet
2ee47eec44 bcachefs: Fix for copygc getting stuck waiting for reserve to be filled
This fixes a regression from the patch
  bcachefs: Fix copygc dying on startup

In general only the allocator thread itself should be updating
ca->allocator_state, the thread waking up the allocator setting it is an
ugly hack only needed to avoid racing with the copygc threads when we're
first starting up.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:54 -04:00
Kent Overstreet
bae895a5a3 bcachefs: Add allocator thread state to sysfs
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:54 -04:00
Kent Overstreet
7f4e1d5d0f bcachefs: KEY_TYPE_alloc_v2
This introduces a new version of KEY_TYPE_alloc, which uses the new
varint encoding introduced for inodes. This means we'll eventually be
able to support much larger bucket sizes (for SMR devices), and the
read/write time fields are expanded to 64 bits - which will be used in
the next patch to get rid of the periodic rescaling of those fields.

Also, for buckets that are members of erasure coded stripes, this adds
persistent fields for the index of the stripe they're members of and the
stripe redundancy. This is part of work to get rid of having to scan and
read into memory the alloc and stripes btrees at mount time.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:52 -04:00
Kent Overstreet
bfcf840ddf bcachefs: Mark superblocks transactionally
More work towards getting rid of the in memory struct bucket: this path
adds code for marking superblock and journal buckets via the btree, and
uses it in the device add and journal resize paths.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:52 -04:00
Kent Overstreet
f30dd86012 bcachefs: Don't write bucket IO time lazily
With the btree key cache code, we don't need to update the alloc btree
lazily - and this will mean we can remove the bch2_alloc_write() call in
the shutdown path.

Future work: we really need to expend the bucket IO clocks from 16 to 64
bits, so that we don't have to rescale them.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:50 -04:00
Kent Overstreet
29364f3453 bcachefs: Drop sysfs interface to debug parameters
It's not used much anymore, the module paramter interface is better.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:45 -04:00
Kent Overstreet
39283c712e bcachefs: Fix for bad stripe pointers
The allocator usually doesn't increment bucket gens right away on
buckets that it's about to hand out (for reasons that need to be
documented), instead deferring that to whatever extent update first
references that bucket.

But stripe pointers reference buckets without changing bucket sector
counts, meaning we could end up with a pointer in a stripe with a gen
newer than the bucket it points to.

Fix this by adding a transactional trigger for KEY_TYPE_stripe that just
writes out the keys in the alloc btree for the buckets it points to.

Also - consolidate the code that checks pointer validity.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:44 -04:00
Kent Overstreet
8d6b6222bf bcachefs: Improvements to writing alloc info
Now that we've got transactional alloc info updates (and have for
awhile), we don't need to write it out on shutdown, and we don't need to
write it out on startup except when GC found errors - this is a big
improvement to mount/unmount performance.

This patch also fixes a few bugs where we weren't writing out alloc
info (on new filesystems, and new devices) and should have been.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:44 -04:00
Kent Overstreet
9f20ed157d bcachefs: Fix copygc dying on startup
The copygc threads errors out and makes the filesystem go RO if it ever
tries to run and discovers it has no reserve allocated - which is a
problem if it races with the allocator thread and its reserve hasn't
been filled yet.

The allocator thread doesn't start filling the copygc reserve until
after BCH_FS_STARTED has been set, so make sure to wake up the allocator
threads after setting that and before starting copygc.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:44 -04:00
Kent Overstreet
5d20ba48f0 bcachefs: Use cached iterators for alloc btree
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:41 -04:00
Kent Overstreet
a27443bc76 bcachefs: Kill old allocator startup code
It's not needed anymore since we can now write to buckets before
updating the alloc btree.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:40 -04:00
Kent Overstreet
6671a7089f bcachefs: Refactor bch2_alloc_write()
Major simplification - gets rid of the need for marking buckets as
dirty, instead we write buckets if the in memory mark is different from
what's in the btree.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:26 -04:00
Kent Overstreet
67163cded3 bcachefs: Trust in memory bucket mark
This fixes a bug in the journal replay -> extent_replay_key ->
split_compressed path, when we do an update that changes alloc info but
the alloc info in the btree isn't up to date yet.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:26 -04:00
Kent Overstreet
460651ee86 bcachefs: Various improvements to bch2_alloc_write()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:21 -04:00
Kent Overstreet
932aa83745 bcachefs: bch2_trans_mark_update()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:21 -04:00
Kent Overstreet
d07343561e bcachefs: Deduplicate keys in the journal before replay
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:20 -04:00
Kent Overstreet
a0e0bda117 bcachefs: Pass flags arg to bch2_alloc_write()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:20 -04:00
Kent Overstreet
8fe826f90a bcachefs: Convert bucket invalidation to key marking path
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:16 -04:00
Kent Overstreet
430735cd1a bcachefs: Persist alloc info on clean shutdown
- Does not persist alloc info for stripes yet
 - Also does not yet include filesystem block/sector counts yet, from
struct fs_usage
 - Not made use of just yet

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:14 -04:00
Kent Overstreet
d0cc3defba bcachefs: More allocator startup improvements
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:14 -04:00
Kent Overstreet
26609b619f bcachefs: Make bkey types globally unique
this lets us get rid of a lot of extra switch statements - in a lot of
places we dispatch on the btree node type, and then the key type, so
this is a nice cleanup across a lot of code.

Also improve the on disk format versioning stuff.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:12 -04:00
Kent Overstreet
61274e9d45 bcachefs: Allocator startup improvements
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:12 -04:00
Kent Overstreet
319f9ac38e bcachefs: revamp to_text methods
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:11 -04:00
Kent Overstreet
8b335baef2 bcachefs: Assorted fixes for running on very small devices
It's now possible to create and use a filesystem on a 512k device with
4k buckets (though at that size we still waste almost half to internal
reserves)

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:11 -04:00
Kent Overstreet
b092dadd55 bcachefs: Scale down number of writepoints when low on space
this means we don't have to reserve space for them when calculating
filesystem capacity

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:11 -04:00
Kent Overstreet
7b3f84ea7d bcachefs: Split out alloc_background.c
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:10 -04:00