193 Commits

Author SHA1 Message Date
Kent Overstreet
990d42d187 bcachefs: Split out struct gc_stripe from struct stripe
We have two radix trees of stripes - one that mirrors some information
from the stripes btree in normal operation, and another that GC uses to
recalculate block usage counts.

The normal one is now only used for finding partially empty stripes in
order to reuse them - the normal stripes radix tree and the GC stripes
radix tree are used significantly differently, so this patch splits them
into separate types.

In an upcoming patch we'll be replacing c->stripes with a btree that
indexes stripes by the order we want to reuse them.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:18 -04:00
Kent Overstreet
94a3e1a6c1 bcachefs: bch2_trans_update() is now __must_check
With snapshots, bch2_trans_update() has to check if we need a whitout,
which can cause a transaction restart, so this is important now.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:18 -04:00
Kent Overstreet
b547d005d5 bcachefs: Erasure coding fixes
When we added the stripe and stripe_redundancy fields to alloc keys, we
neglected to add them to the functions that convert back and forth with
the in-memory types.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:18 -04:00
Kent Overstreet
181fe42a75 bcachefs: Handle replica marking fsck errors locally
This simplifies the code quite a bit and eliminates an inconsistency - a
given bkey doesn't necessarily translate to a single replicas entry for
disk space accounting.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:18 -04:00
Kent Overstreet
58e1ea4bcb bcachefs: Push c->mark_lock usage down to where it is needed
This changes the bch2_mark_key() and related paths to take mark lock
where it is needed, instead of taking it in the upper transaction commit
path - by pushing down locking we'll be able to handle fsck errors
locally instead of requiring a separate check in the btree_gc code for
replicas being marked.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:18 -04:00
Kent Overstreet
502cfb3591 bcachefs: Kill bch2_replicas_delta_list_marked()
This changes bch2_trans_fs_usage_apply() to handle failure (replicas
entry missing) by reverting the changes it made - meaning we can make
the main transaction commit path a bit slimmer, and perhaps also
simplify some locking in upcoming patches.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:18 -04:00
Kent Overstreet
f0c3f88b35 bcachefs: Run insert triggers before overwrite triggers
Currently, btree triggers are run in natural key order, which presents a
problem for fallocate in INSERT_RANGE mode: since we're moving existing
extents to higher offsets, the trigger for deleting the old extent runs
before the trigger that adds the new extent, potentially leading to
indirect extents being deleted that shouldn't be when the delete causes
the refcount to hit 0.

This changes the order we run triggers so that for a givin btree, we run
all insert triggers before overwrite triggers, nicely sidestepping this
issue.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:17 -04:00
Kent Overstreet
c714614bd0 bcachefs: Disk space accounting fix on brand-new fs
The filesystem initialization path first marks superblock and journal
buckets non transactionally, since the btree isn't functional yet. That
path was updating the per-journal-buf percpu counters via
bch2_dev_usage_update(), and updating the wrong set of counters so those
updates didn't get written out until journal entry 4.

The relevant code is going to get significantly rewritten in the future
as we transition away from the in memory bucket array, so this just
hacks around it for now.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:17 -04:00
Kent Overstreet
076c783cd3 bcachefs: Fix upgrade path for reflink_p fix
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:16 -04:00
Kent Overstreet
3e52c22255 bcachefs: Add journal_seq to inode & alloc keys
Add fields to inode & alloc keys that record the journal sequence number
when they were most recently modified.

For alloc keys, this is needed to know what journal sequence number we
have to flush before the bucket can be reused. Currently this is tracked
in memory, but we'll be getting rid of the in memory bucket array.

For inodes, this is needed for fsync when the inode has been evicted
from the vfs cache. Currently we use a bloom filter per outstanding
journal buf - but that mechanism has been broken since we added the
ability to not issue a flush/fua for every journal write.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:16 -04:00
Kent Overstreet
2debb1b875 bcachefs: BTREE_TRIGGER_INSERT now only means insert
This allows triggers to distinguish between a key entering the btree -
i.e. being called from the trans commit path - vs. being called on a key
that already exists, i.e. by GC.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:15 -04:00
Kent Overstreet
904823de49 bcachefs: Convert bch2_mark_key() to take a btree_trans *
This helps to unify the interface between bch2_mark_key() and
bch2_trans_mark_key() - and it also gives access to the journal
reservation and journal seq in the mark_key path.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:15 -04:00
Kent Overstreet
961b2d6282 bcachefs: Assorted ec fixes
- The backpointer that ec_stripe_update_ptrs() uses now needs to include
  the snapshot ID, which means we have to change where we add the
  backpointer to after getting the snapshot ID for the new extents

- ec_stripe_update_ptrs() needs to be calling bch2_trans_begin()

- improve error message in bch2_mark_stripe()

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:15 -04:00
Kent Overstreet
37f72492f4 bcachefs: Fix bch2_mark_update()
When the old or new key doesn't exist, we should still pass in a deleted
key with the correct pos. This fixes a bug in the ec code, when
bch2_mark_stripe() was looking up the wrong in-memory stripe.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:15 -04:00
Kent Overstreet
f3b1e19379 bcachefs: Improve error messages in trans_mark_reflink_p()
We should always print out the key we were marking.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:15 -04:00
Kent Overstreet
396a887d8f bcachefs: Fix fsck path for refink pointers
The way __bch2_mark_reflink_p returns errors was clashing with returning
the number of sectors processed - we weren't returning FSCK_ERR_EXIT
correctly.

Fix this by only using the return code for errors, which actually ends
up simplifying the overall logic.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:15 -04:00
Kent Overstreet
6d76aefea1 bcachefs: Fix for leaking of reflinked extents
When a reflink pointer points to only part of an indirect extent, and
then that indirect extent is fragmented (e.g. by copygc), if the reflink
pointer only points to one of the fragments we leak a reference.

Fix this by storing front/back pad values in reflink pointers - when
inserting reflink pointesr, we initialize them to cover the full range
of the indirect extents we reference.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:14 -04:00
Kent Overstreet
dfc276df91 bcachefs: Improve reflink repair code
When a reflink pointer points to an indirect extent that doesn't exist,
we need to replace it with a KEY_TYPE_error key.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:14 -04:00
Kent Overstreet
14b393ee76 bcachefs: Subvolumes, snapshots
This patch adds subvolume.c - support for the subvolumes and snapshots
btrees and related data types and on disk data structures. The next
patches will start hooking up this new code to existing code.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:12 -04:00
Kent Overstreet
67e0dd8f0d bcachefs: btree_path
This splits btree_iter into two components: btree_iter is now the
externally visible componont, and it points to a btree_path which is now
reference counted.

This means we no longer have to clone iterators up front if they might
be mutated - btree_path can be shared by multiple iterators, and cloned
if an iterator would mutate a shared btree_path. This will help us use
iterators more efficiently, as well as slimming down the main long lived
state in btree_trans, and significantly cleans up the logic for iterator
lifetimes.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:11 -04:00
Kent Overstreet
6fba6b83b4 bcachefs: Prefer using btree_insert_entry to btree_iter
This moves some data dependencies forward, to improve pipelining.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:11 -04:00
Brett Holman
fd0bd123d5 bcachefs: Fix 32 bit build failures
This fix replaces multiple 64 bit divisions with do_div() equivalents.

Signed-off-by: Brett Holman <bholman.devel@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:10 -04:00
Kent Overstreet
62df3d443c bcachefs: Disk space accounting fix
DIV_ROUND_UP() wasn't doing what we wanted when passing it negative
numbers - fix it by just not passing it negative numbers anymore.

Also, no need to do the scaling by compression ratio for incompressible
data.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:10 -04:00
Kent Overstreet
297d89343d bcachefs: Extensive triggers cleanups
- We no longer mark subsets of extents, they're marked like regular
   keys now - which means we can drop the offset & sectors arguments
   to trigger functions
 - Drop other arguments that are no longer needed anymore in various
   places - fs_usage
 - Drop the logic for handling extents in bch2_mark_update() that isn't
   needed anymore, to match bch2_trans_mark_update()
 - Better logic for hanlding the BTREE_ITER_CACHED_NOFILL case, where we
   don't have an old key to mark

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:07 -04:00
Kent Overstreet
8c3f6da9fc bcachefs: Improve iter->should_be_locked
Adding iter->should_be_locked introduced a regression where it ended up
not being set on the iterator passed to bch2_btree_update_start(), which
is definitely not what we want.

This patch requires it to be set when calling bch2_trans_update(), and
adds various fixups to make that happen.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:06 -04:00
Kent Overstreet
8ee529e9c1 bcachefs: Make sure bch2_trans_mark_update uses correct iter flags
Now that bch2_btree_iter_peek_with_updates() has been removed in favor
of BTREE_ITER_WITH_UPDATES, we need to make sure it's not used where we
don't want it.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:06 -04:00
Kent Overstreet
290448ed2e bcachefs: Don't underflow c->sectors_available
This rarely used error path should've been checking for underflow -
oops.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:06 -04:00
Kent Overstreet
953ee28a3e bcachefs: Kill bch2_btree_iter_peek_cached()
It's now been rolled into bch2_btree_iter_peek_slot()

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:06 -04:00
Kent Overstreet
c1949baa51 bcachefs: Simplify reflink trigger
Now that we only mark entire extents, we can ditch the
"reflink_p_frag_references" code.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:06 -04:00
Kent Overstreet
8e6bbc4181 bcachefs: Move extent_handle_overwrites() to bch2_trans_update()
This lifts handling of overlapping extents out of __bch2_trans_commit()
and moves it to where we first do the update - which means that
BTREE_ITER_WITH_UPDATES can now work correctly in extents mode.

Also, this patch reworks how extent triggers work: previously, on
partial extent overwrite we would pass this information to the trigger,
telling it what part of the extent was being overwritten. But, this
approach has had too many subtle corner cases - now, we only mark whole
extents, meaning on partial extent overwrite we unmark the old extent
and mark the new extent.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:06 -04:00
Kent Overstreet
224ec3e677 bcachefs: Don't mark superblocks past end of usable space
bcachefs-tools recently started putting a backup superblock at the end
of the device. This causes a problem if the bucket size doesn't divide
the device size - but we can fix it by just skipping marking that part.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:05 -04:00
Kent Overstreet
bc3f8b25f3 bcachefs: Check for errors from bch2_trans_update()
Upcoming refactoring is going to change bch2_trans_update() to start
returning transaction restarts.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:05 -04:00
Kent Overstreet
890b74f03d bcachefs: Fsck for reflink refcounts
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:05 -04:00
Kent Overstreet
9eba7c8d15 bcachefs: Reflink refcount fix
__bch2_trans_mark_reflink_p wasn't always correctly returning the number
of sectors processed - the new logic is a bit more straightforward
overall too.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:04 -04:00
Kent Overstreet
7e94eeffe0 bcachefs: Inline fastpath of bch2_disk_reservation_add()
The fastpath now doesn't even disable preemption - instead we use a (non
locked) cmpxchg.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:04 -04:00
Dan Robertson
ed34341189 bcachefs: statfs resports incorrect avail blocks
The current implementation of bch_statfs does not scale the number of
available blocks provided in f_bavail by the reserve factor. This causes
an allocation of a file of this size to fail.

Signed-off-by: Dan Robertson <dan@dlrobertson.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:03 -04:00
Kent Overstreet
bbfcb4519d bcachefs: Fix bch2_extent_can_insert() call
It was being skipped when hole punching, leading to problems when
splitting compressed extents.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:03 -04:00
Brett Holman
2cd0563461 bcachefs: made changes to support clang, fixed a couple bugs
fs/bcachefs/bset.c              edited prefetch macro to add clang support
fs/bcachefs/btree_iter.c        bugfix: initialize iter->real_pos in bch2_btree_iter_init for later use
fs/bcachefs/io.c                bugfix: eliminated undefined behavior (negative bitshift)
fs/bcachefs/buckets.c           bugfix: invert sign to handle 64bit abs()

Signed-off-by: Brett Holman <bpholman5@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:03 -04:00
Dan Robertson
d125615a4e bcachefs: properly initialize used values
- Ensure the second key value in bch_hash_info is initialized to zero
   if the info type is of type BCH_STR_HASH_SIPHASH.

 - Initialize the possibly returned value in bch2_inode_create. Assuming
   bch2_btree_iter_peek returns bkey_s_c_null, the uninitialized value
   of ret could be returned to the user as an error pointer.

 - Fix compiler warning in initialization of bkey_s_c_stripe

fs/bcachefs/buckets.c:1646:35: warning: suggest braces around initialization
of subobject [-Wmissing-braces]
        struct bkey_s_c_stripe new_s = { NULL };
                                         ^~~~

Signed-off-by: Dan Robertson <dan@dlrobertson.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:03 -04:00
Kent Overstreet
933532b8b2 bcachefs: Fix reflink trigger
The trigger for reflink pointers wasn't always incrementing/decrementing
the refcounts correctly - this patch fixes that logic.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:03 -04:00
Kent Overstreet
3a402c8dab bcachefs: Fix some refcounting bugs
We really need debug mode assertions that ca->ref and ca->io_ref are
used correctly.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:03 -04:00
Kent Overstreet
d99af4f194 bcachefs: Call bch2_inconsistent_error() on missing stripe/indirect extent
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:02 -04:00
Kent Overstreet
eb365fbc33 bcachefs: Don't BUG() in update_replicas
Apparently, we have a bug where in mark and sweep while accounting for a
key, a replicas entry isn't found. Change the code to print out the key
we couldn't mark and halt instead of a BUG_ON().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:01 -04:00
Kent Overstreet
04903131db bcachefs: Handle errors in bch2_trans_mark_update()
It's not actually the case that iterators are always checked here -
__bch2_trans_commit() checks for that after running triggers.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:01 -04:00
Kent Overstreet
dac1525d9c bcachefs: gc shouldn't care about owned_by_allocator
The owned_by_allocator field is a purely in memory thing, even if/when
we bring back GC at runtime there's no need for it to be recalculating
this field. This is prep work for pulling it out of struct bucket, and
eventually getting rid of the bucket array.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:01 -04:00
Kent Overstreet
d62ab355d7 bcachefs: Fix bch2_trans_mark_dev_sb()
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:00 -04:00
Kent Overstreet
319c130507 bcachefs: Fix heap overrun in bch2_fs_usage_read() XXX squash
oops

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:00 -04:00
Kent Overstreet
ecc1420944 bcachefs: Fix an uninitialized variable
Fortunately it was just used in an error message

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:59 -04:00
Kent Overstreet
35d5aff263 bcachefs: Kill bch2_fs_usage_scratch_get()
This is an important cleanup, eliminating an unnecessary copy in the
transaction commit path.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:59 -04:00
Kent Overstreet
9c2e624290 bcachefs: Fix livelock calling bch2_mark_bkey_replicas()
The bug was that we were trying to find a replicas entry that wasn't
sorted - but, we can also simplify the code by not using
bch2_mark_bkey_replicas and instead ensuring the list of replicas
entries exists directly.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:08:59 -04:00