1443 Commits

Author SHA1 Message Date
Kent Overstreet
5db95e50e1 bcachefs: Re-implement extent merging in transaction commit path
We haven't had extent merging in quite some time. It used to be done by
the btree code when sorting btree nodes, but that was eliminated as part
of the work to separate extent handling from core btree code.

This patch re-implements extent merging in the transaction commit path.
We don't currently have the ability to merge reflink pointers, we need
to do some work on the triggers code to be able to do that without
ending up with incorrect refcounts.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:06 -04:00
Kent Overstreet
81d22e5d83 bcachefs: Refactor extent_handle_overwrites()
Prep work for extent merging

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:06 -04:00
Kent Overstreet
59ba21d99f bcachefs: Clean up key merging
This patch simplifies the key merging code by getting rid of partial
merges - it's simpler and saner if we just don't merge extents when
they'd overflow k->size.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:06 -04:00
Kent Overstreet
cd8319fdd9 bcachefs: Kill trans->updates2
Now that extent handling has been lifted to bch2_trans_update(), we
don't need to keep two different lists of updates.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:06 -04:00
Kent Overstreet
c1949baa51 bcachefs: Simplify reflink trigger
Now that we only mark entire extents, we can ditch the
"reflink_p_frag_references" code.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:06 -04:00
Kent Overstreet
8e6bbc4181 bcachefs: Move extent_handle_overwrites() to bch2_trans_update()
This lifts handling of overlapping extents out of __bch2_trans_commit()
and moves it to where we first do the update - which means that
BTREE_ITER_WITH_UPDATES can now work correctly in extents mode.

Also, this patch reworks how extent triggers work: previously, on
partial extent overwrite we would pass this information to the trigger,
telling it what part of the extent was being overwritten. But, this
approach has had too many subtle corner cases - now, we only mark whole
extents, meaning on partial extent overwrite we unmark the old extent
and mark the new extent.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:06 -04:00
Kent Overstreet
b1d87f527d bcachefs: bch2_btree_iter_peek_slot() now saves initial position when searching
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:06 -04:00
Kent Overstreet
1d214eb18d bcachefs: Kill __bch2_btree_iter_peek_slot_extents()
This codepath won't just be for extents in the future, it'll also be for
BTREE_ITER_FILTER_SNAPSHOTS mode.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:05 -04:00
Kent Overstreet
e750296bf5 bcachefs: bch2_btree_iter_peek_slot() now supports BTREE_ITER_WITH_UPDATES
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:05 -04:00
Kent Overstreet
5288e66a7b bcachefs: BTREE_ITER_WITH_UPDATES
This drops bch2_btree_iter_peek_with_updates() and replaces it with a
new flag, BTREE_ITER_WITH_UPDATES, and also reworks
bch2_btree_iter_peek_slot() to respect it too.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:05 -04:00
Kent Overstreet
509d3e0a8d bcachefs: Child btree iterators
This adds the ability for btree iterators to own child iterators - to be
used by an upcoming rework of bch2_btree_iter_peek_slot(), so we can
scan forwards while maintaining our current position.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:05 -04:00
Kent Overstreet
c205321b12 bcachefs: Drop all btree locks when submitting btree node reads
As a rule we don't want to be holding btree locks while submitting IO -
this will improve overall filesystem latency.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:05 -04:00
Kent Overstreet
4351d3ecb4 bcachefs: More topology repair code
This improves the handling of overlapping btree nodes; now, we handle
the case where one btree node completely overwrites another.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:05 -04:00
Kent Overstreet
74cc1abdbf bcachefs: Fix a buffer overrun
In make_extent_indirect(), we were allocating too small of a buffer for
the new indirect extent.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:05 -04:00
Kent Overstreet
224ec3e677 bcachefs: Don't mark superblocks past end of usable space
bcachefs-tools recently started putting a backup superblock at the end
of the device. This causes a problem if the bucket size doesn't divide
the device size - but we can fix it by just skipping marking that part.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:05 -04:00
Kent Overstreet
7138f22097 bcachefs: Fix a spurious debug mode assertion
When we switched to using bch2_btree_bset_insert_key() for extents it
turned out it started leaving invalid keys around - of type deleted but
nonzero size - but this is fine (if ugly) because they're never written
out.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:05 -04:00
Brett Holman
ca47fa2362 bcachefs: Fix unitialized use of a value
Signed-off-by: Brett Holman <bpholman5@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:05 -04:00
Dan Robertson
59e2480ff7 bcachefs: do not compile acl mod on minimal config
Do not compile the acl.o target if BCACHEFS_POSIX_ACL is not enabled.

Signed-off-by: Dan Robertson <dan@dlrobertson.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:05 -04:00
Kent Overstreet
66a0a49750 bcachefs: btree_iter->should_be_locked
Add a field to struct btree_iter for tracking whether it should be
locked - this fixes spurious transaction restarts in
bch2_trans_relock().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:05 -04:00
Kent Overstreet
531a0095c9 bcachefs: Improve btree iterator tracepoints
This patch adds some new tracepoints to the btree iterator code, and
adds new fields to the existing tracepoints - primarily for the iterator
position.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:05 -04:00
Kent Overstreet
f7beb4ca04 bcachefs: Preallocate transaction mem
This helps avoid transaction restarts.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:05 -04:00
Kent Overstreet
bc3f8b25f3 bcachefs: Check for errors from bch2_trans_update()
Upcoming refactoring is going to change bch2_trans_update() to start
returning transaction restarts.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:05 -04:00
Kent Overstreet
01254036a3 bcachefs; Check for allocator thread shutdown
We were missing a kthread_should_stop() check in the loop in
bch2_invalidate_buckets(), very occasionally leading to us getting stuck
while shutting down.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:05 -04:00
Kent Overstreet
d7fc453bdb bcachefs: Journal space calculation fix
When devices have different bucket sizes, we may accumulate a journal
write that doesn't fit on some of our devices - previously, we'd
underflow when calculating space on that device and then everything
would get weird.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:05 -04:00
Kent Overstreet
649d9a4dfc bcachefs: Don't fragment extents when making them indirect
This fixes a "disk usage increased without a reservation" bug, when
reflinking compressed extents. Also, there's no good reason for reflink
to be fragmenting extents anyways.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:05 -04:00
Kent Overstreet
890b74f03d bcachefs: Fsck for reflink refcounts
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:05 -04:00
Kent Overstreet
c0ebe3e48c bcachefs: Assorted endianness fixes
Found by sparse

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:05 -04:00
Kent Overstreet
ee7570546e bcachefs: Fix a deadlock
Waiting on a btree node write with btree locks held can deadlock, if the
write errors: the write error path has to do do a btree update to drop
the pointer to the replica that errored.

The interior update path has to wait on in flight btree writes before
freeing nodes on disk. Previously, this was done in
bch2_btree_interior_update_will_free_node(), and could deadlock; now, we
just stash a pointer to the node and do it in
btree_update_nodes_written(), just prior to the transactional part of
the update.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:05 -04:00
Kent Overstreet
9f2772c454 bcachefs: Split out btree_error_wq
We can't use btree_update_wq becuase btree updates may be waiting on
btree writes to complete.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:04 -04:00
Kent Overstreet
bff796ae65 bcachefs: Fix pathalogical behaviour with inode sharding by cpu ID
If the transactior restarts on a different CPU, it could end up needing
to read in a different btree node, which makes another transaction
restart more likely...

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:04 -04:00
Kent Overstreet
d797ca3d8e bcachefs: Fix journal write error path
Journal write errors were racing with the submission path - potentially
causing writes to other replicas to not get submitted.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:04 -04:00
Kent Overstreet
9eba7c8d15 bcachefs: Reflink refcount fix
__bch2_trans_mark_reflink_p wasn't always correctly returning the number
of sectors processed - the new logic is a bit more straightforward
overall too.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:04 -04:00
Kent Overstreet
b282a74fae bcachefs: Add an option to control sharding new inode numbers
We're seeing a bug where inode creates end up spinning in
bch2_inode_create - disabling sharding will simplify what we're testing.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:04 -04:00
Kent Overstreet
9f311f2166 bcachefs: Don't use bch_write_op->cl for delivering completions
We already had op->end_io as an alternative mechanism to op->cl.parent
for delivering write completions; this switches all code paths to using
op->end_io.

Two reasons:
 - op->end_io is more efficient, due to fewer atomic ops, this completes
   the conversion that was originally only done for the direct IO path.
 - We'll be restructing the write path to use a different mechanism for
   punting to process context, refactoring to not use op->cl will make
   that easier.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:04 -04:00
Kent Overstreet
af17118319 bcachefs: Kill bch_write_op.index_update_fn
This deletes bch_write_op.index_update_fn: indirect function calls have
gotten considerably more expensive post spectre/meltdown, and we only
have two different index_update_fns - this patch adds a flag to specify
which one to use (normal vs. data move path).

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:04 -04:00
Kent Overstreet
7e94eeffe0 bcachefs: Inline fastpath of bch2_disk_reservation_add()
The fastpath now doesn't even disable preemption - instead we use a (non
locked) cmpxchg.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:04 -04:00
Kent Overstreet
ddc7dd62f0 bcachefs: Don't use uuid in tracepoints
%pU for printing out pointers to uuids doesn't work in perf trace

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:04 -04:00
Kent Overstreet
19d2819d2d bcachefs: Add a tracepoint for copygc waiting
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:04 -04:00
Kent Overstreet
c4d4b2f01a bcachefs: Add a cond_resched call to the copygc main loop
We seem to have a bug where the copygc thread ends up spinning and
making the system unusable - this will at least prevent it from locking
up the machine, and it's a good thing to have anyways.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:04 -04:00
Kent Overstreet
443d2760e5 bcachefs: Fix a null ptr deref
bch2_btree_iter_peek() won't always return a key - whoops.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:04 -04:00
Kent Overstreet
9dd89a05fd bcachefs: Fix an issue with inconsistent btree writes after unclean shutdown
After unclean shutdown, btree writes may have completed on one device
and not others - and this inconsistency could lead us to writing new
bsets with a gap in our btree node in one of our replicas.

Fortunately, this is only an issue with bsets that are newer than the
most recent journal flush, and we already have a mechanism for detecting
and blacklisting those. We just need to make sure to start new btree
writes after the most recent _non_ blacklisted bset.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:04 -04:00
Kent Overstreet
4495cbed56 bcachefs: Improve FS_IOC_GOINGDOWN ioctl
We weren't interpreting the flags argument at all.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:04 -04:00
Kent Overstreet
731bdd2eff bcachefs: Add a workqueue for btree io completions
Also, clean up workqueue usage - we shouldn't be using system
workqueues, pretty much everything we do needs to be on our own
WQ_MEM_RECLAIM workqueues.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:04 -04:00
Brett Holman
2eba51a69a bcachefs: rewrote prefetch asm in gas syntax for clang compatibility
Signed-off-by: Brett Holman <bpholman5@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:04 -04:00
Kent Overstreet
1ce0cf5fe9 bcachefs: Add a debug mode that always reads from every btree replica
There's a new module parameter, verify_all_btree_replicas, that enables
reading from every btree replica when reading in btree nodes and
comparing them against each other. We've been seeing some strange btree
corruption - this will hopefully aid in tracking it down and catching it
more often.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:04 -04:00
Kent Overstreet
596d3bdc1e bcachefs: Don't repair btree nodes until after interior journal replay is done
We need the btree to be in a consistent state before we can rewrite
btree nodes.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:04 -04:00
Kent Overstreet
304b7e08c7 bcachefs: Fix an uninitialized var
this fixes a valgrind complaint

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:04 -04:00
Kent Overstreet
a6336910b1 bcachefs: Fix for buffered writes getting -ENOSPC
Buffered writes may have to increase their disk reservation at btree
update time, due to compression and erasure coding being unpredictable:
O_DIRECT writes should be checking for -ENOSPC, but buffered writes have
already been accepted and should not.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:04 -04:00
Kent Overstreet
16ac8c9523 bcachefs: Fix inode backpointers in RENAME_OVERWRITE
When we delete the dirent an inode points to, we need to zero out the
backpointer fields - this was missed in the RENAME_OVERWRITE case.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:04 -04:00
Kent Overstreet
e7084c9c81 bcachefs: Make bch2_remap_range respect O_SYNC
Caught by xfstest generic/628

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:04 -04:00