85186 Commits

Author SHA1 Message Date
Kent Overstreet
955af63441 bcachefs: __bch2_trans_commit() no longer calls bch2_trans_reset()
It's now the caller's responsibility to call bch2_trans_begin.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:09 -04:00
Kent Overstreet
e829b7175b bcachefs: Ensure btree_iter_traverse() obeys iter->should_be_locked
iter->should_be_locked means that if bch2_btree_iter_relock() fails, we
need to restart the transaction.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:09 -04:00
Kent Overstreet
b4e09b351b bcachefs: bch2_btree_iter_traverse() shouldn't normally call traverse_all()
If there's more than one iterator in the btree_trans, it's requried to
call bch2_trans_begin() to handle transaction restarts.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:09 -04:00
Kent Overstreet
e5af273fce bcachefs: trans->restarted
Start tracking when btree transactions have been restarted - and assert
that we're always calling bch2_trans_begin() immediately after
transaction restart.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:09 -04:00
Kent Overstreet
3cc5288a62 bcachefs: Change lockrestart_do() to always call bch2_trans_begin()
More consistent behaviour means less likely to trip over ourselves in
silly ways.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:09 -04:00
Kent Overstreet
a88171c9e6 bcachefs: Clean up interior update paths
Btree node merging now happens prior to transaction commit, not after,
so we don't need to pay attention to BTREE_INSERT_NOUNLOCK.

Also, foreground_maybe_merge shouldn't be calling
bch2_btree_iter_traverse_all() - this is becoming private to the btree
iterator code and should only be called by bch2_trans_begin().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:09 -04:00
Kent Overstreet
700c25b32a bcachefs: Use bch2_trans_begin() more consistently
Upcoming patch will require that a transaction restart is always
immediately followed by bch2_trans_begin().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:09 -04:00
Kent Overstreet
8b3e9bd65f bcachefs: Always check for transaction restarts
On transaction restart iterators won't be locked anymore - make sure
we're always checking for errors.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:09 -04:00
Kent Overstreet
67b07638f1 bcachefs: traverse_all() is responsible for clearing should_be_locked
bch2_btree_iter_traverse_all() may loop, and it needs to clear
iter->should_be_locked on every iteration.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:09 -04:00
Kent Overstreet
fe5233979a bcachefs: bch2_trans_relock() only relocks iters that should be locked
This avoids unexpected lock restarts in bch2_btree_iter_traverse_all().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:09 -04:00
Kent Overstreet
6918bb55f6 bcachefs: Don't traverse iterators in __bch2_trans_commit()
They should already be traversed, and we're asserting that since the
introduction of iter->should_be_locked

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:09 -04:00
Kent Overstreet
a32b9573c7 bcachefs: Add an option for btree node mem ptr optimization
bch2_btree_node_ptr_v2 has a field for stashing a pointer to the in
memory btree node; this is safe because we clear this field when reading
in nodes from disk and we never free in memory btree nodes - but, we
have bug reports that indicate something might be faulty with this
optimization, so let's add an option for it.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:09 -04:00
Kent Overstreet
2b4e4b8cfa bcachefs: Minor tracepoint improvements
Btree iterator tracepoints should print whether they're for the key
cache.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:09 -04:00
Kent Overstreet
6e075b54a3 bcachefs: bch2_btree_iter_relock_intent()
This adds a new helper for btree_cache.c that does what we want where
the iterator is still being traverse - and also eliminates some
unnecessary transaction restarts.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:09 -04:00
Kent Overstreet
a6eba44b88 bcachefs: Use bch2_trans_do() in bch2_btree_key_cache_journal_flush()
We're working to standardize handling of transaction restarts.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:09 -04:00
Kent Overstreet
ed5580b43b bcachefs: Fix a btree iterator leak
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:09 -04:00
Kent Overstreet
d7b21954b9 bcachefs: Pretty-ify bch2_bkey_val_to_text()
Don't print out the ": " when there isn't a value to print.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:09 -04:00
Kent Overstreet
3820054426 bcachefs: Don't squash return code in check_dirents()
We were squashing BCH_FSCK_ERRORS_NOT_FIXED.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:09 -04:00
Kent Overstreet
b97bbd4ec3 bcachefs: Use bch2_inode_find_by_inum() in truncate
This is needed for snapshots because we need to start handling lock
restarts even when just calling bch2_inode_peek().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:09 -04:00
Kent Overstreet
4909fe50b3 bcachefs: Handle lock restarts in bch2_xattr_get()
Snapshots add another btree lookup, thus we need to handle lock
restarts.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:09 -04:00
Kent Overstreet
5f87f3c116 bcachefs: Don't downgrade in traverse()
Downgrading of btree iterators is something that should only happen
explicitly.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:09 -04:00
Kent Overstreet
e719fc34f0 bcachefs: BSET_OFFSET()
Add a field to struct bset for the sector offset within the btree node
where it was written.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:09 -04:00
Kent Overstreet
47924527e6 Revert "bcachefs: statfs bfree and bavail should be the same"
This reverts commit 664f9847bec525d396d62d2db094ca9020289ae0.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:08 -04:00
Kent Overstreet
9f1833cadd bcachefs: Update btree ptrs after every write
This closes a significant hole (and last known hole) in our ability to
verify metadata. Previously, since btree nodes are log structured, we
couldn't detect lost btree writes that weren't the first write to a
given node. Additionally, this seems to have lead to some significant
metadata corruption on multi device filesystems with metadata
replication: since a write may have made it to one device and not
another, if we read that btree node back from the replica that did have
that write and started appending after that point, the other replica
would have a gap in the bset entries and reading from that replica
wouldn't find the rest of the bsets.

But, since updates to interior btree nodes are now journalled, we can
close this hole by updating pointers to btree nodes after every write
with the currently written number of sectors, without negatively
affecting performance. This means we will always detect lost or corrupt
metadata - it also means that our btree is now a curious hybrid of COW
and non COW btrees, with all the benefits of both (excluding
complexity).

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:08 -04:00
Kent Overstreet
f8f86c6aec bcachefs: Improve btree_bad_header() error message
We should always print out the full btree node ptr.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:08 -04:00
Kent Overstreet
eb7f44db8d bcachefs: Fixes for unit tests
The unit tests hadn't been updated for various recent btree changes -
this patch makes them work again.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:08 -04:00
Kent Overstreet
71f892a482 bcachefs: Fix bch2_btree_iter_rewind()
We'd hit a BUG() when rewinding at the start of the btree on btrees with
snapshots.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:08 -04:00
Kent Overstreet
914f2786b8 bcachefs: Improvements to fsck check_dirents()
The fsck code handles transaction restarts in a very ad hoc way, and not
always correctly. This patch makes some improvements to check_dirents(),
but more work needs to be done to figure out how this kind of code
should be structured.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:08 -04:00
Kent Overstreet
5aab663534 bcachefs: Tighten up btree_iter locking assertions
We weren't correctly verifying that we had interior node intent locks -
this patch also fixes bugs uncovered by the new assertions.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:08 -04:00
Kent Overstreet
5468f1195d bcachefs: Fix a memory leak in the dio write path
There were some error paths where we were leaking page refs - oops.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:08 -04:00
Kent Overstreet
996fb577fd bcachefs: Add an option for whether inodes use the key cache
We probably don't ever want to flip this off in production, but it may
be useful for certain kinds of testing.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:08 -04:00
Kent Overstreet
9f6e1f7bb0 bcachefs: Fix an allocator shutdown deadlock
On fstest generic/388, we were seeing sporadic deadlocks in the
emergency shutdown, where we'd get stuck shutting down the allocator
because bch2_btree_update_start() -> bch2_btree_reserve_get() allocated
and then deallocated some btree nodes, putting them back on the
btree_reserve_cache, after the allocator shutdown code had already
cleared out that cache.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:08 -04:00
Kent Overstreet
8d34458781 bcachefs: Add safe versions of varint encode/decode
This adds safe versions of bch2_varint_(encode|decode) that don't read
or write past the end of the buffer, or varint being encoded.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:08 -04:00
Kent Overstreet
2e655e6de2 bcachefs: Add open_buckets to sysfs
This is to help debug a rare shutdown deadlock in the allocator code -
the btree code is leaking open_buckets.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:08 -04:00
Kent Overstreet
003e738d4f bcachefs: Ensure bad d_type doesn't oops in bch2_dirent_to_text()
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:08 -04:00
Kent Overstreet
0a70089062 bcachefs: Kick off btree node writes from write completions
This is a performance improvement by removing the need to wait for the
in flight btree write to complete before kicking one off, which is going
to be needed to avoid a performance regression with the upcoming patch
to update btree ptrs after every btree write.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:08 -04:00
Kent Overstreet
2680325b78 bcachefs: Mask out unknown compat features when going read-write
Compat features should be cleared if the filesystem was touched by a
version that doesn't support them.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:08 -04:00
Kent Overstreet
19d5432445 bcachefs: Really don't hold btree locks while btree IOs are in flight
This is something we've attempted to stick to for quite some time, as it
helps guarantee filesystem latency - but there's a few remaining paths
that this patch fixes.

This is also necessary for an upcoming patch to update btree pointers
after every btree write - since the btree write completion path will now
be doing btree operations.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:08 -04:00
Kent Overstreet
e3a67bdb6e bcachefs: Regularize argument passing of btree_trans
btree_trans should always be passed when we have one - iter->trans is
disfavoured. This mainly updates old code in btree_update_interior.c,
some of which predates btree_trans.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:08 -04:00
Dan Robertson
d38494c462 bcachefs: docs: add docs for bch2_trans_reset
Add basic kernel docs for bch2_trans_reset and bch2_trans_begin.

Signed-off-by: Dan Robertson <dan@dlrobertson.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:08 -04:00
Dan Robertson
f0412b6e44 bcachefs: set disk state should check new_state
A new device state that is not a valid state should return -EINVAL
in the disk set state ioctl.

Signed-off-by: Dan Robertson <dan@dlrobertson.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:08 -04:00
Kent Overstreet
b00fde8fb1 bcachefs: BTREE_UPDATE_INTERNAL_SNAPSHOT_NODE
Add a new flag to control assertions about updating to internal snapshot
nodes, that normally should not be written to - to be used in an
upcoming patch.

Also do some renaming - trigger_flags is now update_flags.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:08 -04:00
Kent Overstreet
d5bee8ca5a bcachefs: bch2_d_types[]
Add readable names for d_type, and use it in dirent_to_text().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:08 -04:00
Kent Overstreet
c21affdd06 bcachefs: Fix bch2_btree_iter_peek_slot() assertion
This assertion is checking that what the iterator points to is
consistent with iter->real_pos, and since it's an internal btree
ordering property it should be using bpos_cmp.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:07 -04:00
Kent Overstreet
618b1c0e20 bcachefs: Split out SPOS_MAX
Internal btree code really wants a POS_MAX with all fields ~0; external
code more likely wants the snapshot field to be 0, because when we're
passing it to bch2_trans_get_iter() it's used for the snapshot we're
operating in, which should be 0 for most btrees that don't use
snapshots.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:07 -04:00
jpsollie
41e633826a bcachefs: add bcachefs xxhash support
xxhash is a much faster algorithm compared to crc32.
could be used to speed up checksum calculation.
xxhash 64-bit only, as it is much faster on 64-bit CPUs compared to xxh32.

Signed-off-by: jpsollie <janpieter.sollie@edpnet.be>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:07 -04:00
jpsollie
80ff5d18ee bcachefs: Prepare checksums for more advanced algorithms
Perform abstraction of hash calculation for advanced checksum algorithms.
Algorithms like xxhash do not store their state as a u64 int.

Signed-off-by: jpsollie <janpieter.sollie@edpnet.be>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:07 -04:00
Tobias Geerinckx-Rice
a515d0a50c bcachefs: Enforce SYS_CAP_ADMIN within ioctls
bch2_fs_ioctl() didn't distinguish between unsupported ioctls and those
which the current user is unauthorised to perform.  That kept the code
simple but meant that, for example, an unprivileged TIOCGWINSZ ioctl on
a bcachefs file would return -EPERM instead of the expected -ENOTTY.
The same call made by a privileged user would correctly return -ENOTTY.

Fix this discrepancy by moving the check for CAP_SYS_ADMIN into each
privileged ioctl function.

Signed-off-by: Tobias Geerinckx-Rice <me@tobias.gr>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:07 -04:00
Kent Overstreet
508b1f7139 bcachefs: Fix bch2_btree_iter_peek_prev()
In !BTREE_ITER_IS_EXTENTS mode, we shouldn't be looking at k->size, i.e.
we shouldn't use bkey_start_pos().

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:07 -04:00
Dan Robertson
31029f2f70 bcachefs: Fix bch2_acl_chmod() cleanup on error
Avoid calling kfree on the returned error pointer if
bch2_acl_from_disk fails.

Signed-off-by: Dan Robertson <dan@dlrobertson.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:07 -04:00