1217159 Commits

Author SHA1 Message Date
Kent Overstreet
160dff6dad bcachefs: Ratelimit ec error message
We should fix this, but for now this makes this more usable.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:42 -04:00
Kent Overstreet
2da671dc4a bcachefs: Use btree_type_has_ptrs() more consistently
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:42 -04:00
Kent Overstreet
6c22eb7085 bcachefs: Fix "multiple types of data in same bucket" with ec
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:42 -04:00
Kent Overstreet
22f5162133 bcachefs: Ensure fsck error is printed before panic
When errors=panic, we want to make sure we print the error before
calling bch2_inconsistent_error().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:42 -04:00
Kent Overstreet
8aaee94d46 bcachefs: Fix a deadlock in btree_update_nodes_written()
btree_node_lock_nopath() is something we'd like to get rid of, it's
always prone to deadlocks if we accidentally are holding other locks,
because it doesn't mark the lock it's taking in a path: we'll want to
get rid of it in the future, but for now this patch works it by calling
bch2_trans_unlock().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:42 -04:00
Kent Overstreet
13bc41a715 bcachefs: bch2_trans_locked()
Useful debugging function.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:42 -04:00
Kent Overstreet
40a44873a5 bcachefs: Improve btree_deadlock debugfs output
This changes bch2_check_for_deadlock() to print the longest chains it
finds - when we have a deadlock because the cycle detector isn't finding
something, this will let us see what it's missing.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:42 -04:00
Kent Overstreet
943f9946a6 bcachefs: Don't quash error in bch2_bucket_alloc_set_trans()
We were incorrectly returning -BCH_ERR_insufficient_devices when we'd
received a different error from bch2_bucket_alloc_trans(), which
(erronously) turns into -EROFS further up the call chain.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:42 -04:00
Kent Overstreet
685e0f0c47 bcachefs: Fix a trans path overflow in bch2_btree_delete_range_trans()
bch2_btree_delete_range_trans() was using btree_trans_too_many_iters()
to avoid path overflow, but this was buggy here (and also
btree_trans_too_many_iters() is suspect in general).

btree_trans_too_many_iters() only returns true when we're close to the
maximum number of paths - within 8 - but extent insert/delete assumes
that it can use more paths than that.

Instead, we need to call bch2_trans_begin() on every loop iteration.
Since we don't want to call bch2_trans_begin() (restarting the outer
transaction) if the call was a no-op - if we had no work to do - we have
to structure things a bit oddly.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:42 -04:00
Kent Overstreet
ae10fe017b bcachefs: bucket_alloc_state
This refactoring puts our various allocation path counters into a
dedicated struct - the upcoming nocow patch is going to add another
counter.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:42 -04:00
Kent Overstreet
29cea6f483 bcachefs: Fix bch2_btree_path_up_until_good_node()
There was a rare bug when path->locks_want was nonzero, but not
BTREE_MAX_DEPTH, where we'd return on a valid node that wasn't locked -
oops.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:42 -04:00
Kent Overstreet
e0eaf86259 bcachefs: Factor out bch2_write_drop_io_error_ptrs()
Move slowpath code to a separate, non-inline function.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:42 -04:00
Kent Overstreet
99e2146bea bcachefs: Break out bch2_btree_path_traverse_cached_slowpath()
Prep work for further refactoring.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:42 -04:00
Kent Overstreet
2d848dacb2 bcachefs: Kill io_in_flight semaphore
This used to be needed more for buffered IO, but now the block layer has
writeback throttling - we can delete this now.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:42 -04:00
Kent Overstreet
68b6cd194a bcachefs: Improve bucket_alloc tracepoint
It now includes more info - whether the bucket was for metadata or data
- and also call it in the same place as the bucket_alloc_fail
tracepoint.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:42 -04:00
Kent Overstreet
c298fd7d34 bcachefs; Mark __bch2_trans_iter_init as inline
This function is fairly small and only used in two places: one very hot,
the other cold, so it should definitely be inlined.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:42 -04:00
Kent Overstreet
25b4b3308e bcachefs: Inline fast path of check_pos_snapshot_overwritten()
This moves the slowpath of check_pos_snapshot_overwritten() to a
separate function, and inlines the fast path - helping performance on
btrees that don't use snapshot and for users that aren't using
snapshots.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:42 -04:00
Kent Overstreet
c23a9e0882 bcachefs: Improve jset_validate()
Previously, jset_validate() was formatting the initial part of an error
string for every entry it validating - expensive.

This moves that code to journal_entry_err_msg(), which is now only
called if there's an actual error.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:42 -04:00
Kent Overstreet
3f3bc66ef0 bcachefs: Optimize btree_path_alloc()
- move slowpath code to a separate function, btree_path_overflow()
 - no need to use hweight64
 - copy nr_max_paths from btree_transaction_stats to btree_trans,
   avoiding a data dependency in the fast path

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:42 -04:00
Kent Overstreet
14d8f26ad0 bcachefs: Inline bch2_trans_kmalloc() fast path
Small performance optimization.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:42 -04:00
Kent Overstreet
f3b8403ee7 bcachefs: Run bch2_fs_counters_init() earlier
We need counters to be initialized before initializing shrinkers - the
shrinker callbacks will update those counters. This fixes a segfault in
userspace.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:41 -04:00
Kent Overstreet
d704d62355 bcachefs: btree_err() now uses bch2_print_string_as_lines()
We've seen long error messages get truncated here, so convert to the new
bch2_print_string_as_lines().

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:41 -04:00
Kent Overstreet
dbb9936b0d bcachefs: Improve bch2_fsck_err()
- factor out fsck_err_get()
 - if the "bcachefs (%s):" prefix has already been applied, don't
   duplicate it
 - convert to printbufs instead of static char arrays
 - tidy up control flow a bit
 - use bch2_print_string_as_lines(), to avoid messages getting truncated

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:41 -04:00
Kent Overstreet
a8f3542843 bcachefs: bch2_print_string_as_lines()
This adds a helper for printing a large buffer one line at a time, to
avoid the 1k printk limit.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:41 -04:00
Kent Overstreet
e9174370d0 bcachefs: bch2_btree_node_relock_notrace()
Most of the node_relock_fail trace events are generated from
bch2_btree_path_verify_level(), when debugcheck_iterators is enabled -
but we're not interested in these trace events, they don't indicate that
we're in a slowpath.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:41 -04:00
Kent Overstreet
c36ff038fd bcachefs: bch2_btree_cache_scan() improvement
We're still seeing OOM issues caused by the btree node cache shrinker
not sufficiently freeing memory: thus, this patch changes the shrinker
to not exit if __GFP_FS was not supplied.

Instead, tweak btree node memory allocation so that we never invoke
memory reclaim while holding the btree node cache lock.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:41 -04:00
Kent Overstreet
c6cf49a95a bcachefs: Fix blocking with locks held
This is a major oopsy - we should always be unlocking before calling
closure_sync(), else we'll cause a deadlock.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:41 -04:00
Kent Overstreet
01ed3359b2 bcachefs: btree_update_nodes_written() needs BTREE_INSERT_USE_RESERVE
This fixes an obvious deadlock - whoops.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:41 -04:00
Kent Overstreet
d602657cd1 bcachefs: Fix error handling in bch2_btree_update_start()
We were checking for -EAGAIN, but we're not returned that when we didn't
pass a closure to wait with - oops.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:41 -04:00
Kent Overstreet
afbc719468 bcachefs: Improve bch2_btree_trans_to_text()
This is just a formatting/readability improvement.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:41 -04:00
Kent Overstreet
8b31e4fc7d bcachefs: Kill normalize_read_intent_locks()
Before we had the deadlock cycle detector, we didn't want to be holding
read locks when taking intent locks, because blocking on an intent lock
while holding a read lock was a lock ordering violation that could
cause a deadlock.

With the cycle detector this is no longer an issue, so this code can be
deleted.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:41 -04:00
Kent Overstreet
2ec254c098 bcachefs: Ensure bch2_btree_node_lock_write_nofail() never fails
In order for bch2_btree_node_lock_write_nofail() to never produce a
deadlock, we must ensure we're never holding read locks when using it.
Fortunately, it's only used from code paths where any read locks may be
safely dropped.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:41 -04:00
Kent Overstreet
0d7009d7ca bcachefs: Delete old deadlock avoidance code
This deletes our old lock ordering based deadlock avoidance code.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:41 -04:00
Kent Overstreet
96d994b37c bcachefs: Print deadlock cycle in debugfs
In the event that we're not finished debugging the cycle detector, this
adds a new file to debugfs that shows what the cycle detector finds, if
anything. By comparing this with btree_transactions, which shows held
locks for every btree_transaction, we'll be able to determine if it's
the cycle detector that's buggy or something else.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:41 -04:00
Kent Overstreet
33bd5d0686 bcachefs: Deadlock cycle detector
We've outgrown our own deadlock avoidance strategy.

The btree iterator API provides an interface where the user doesn't need
to concern themselves with lock ordering - different btree iterators can
be traversed in any order. Without special care, this will lead to
deadlocks.

Our previous strategy was to define a lock ordering internally, and
whenever we attempt to take a lock and trylock() fails, we'd check if
the current btree transaction is holding any locks that cause a lock
ordering violation. If so, we'd issue a transaction restart, and then
bch2_trans_begin() would re-traverse all previously used iterators, but
in the correct order.

That approach had some issues, though.
 - Sometimes we'd issue transaction restarts unnecessarily, when no
   deadlock would have actually occured. Lock ordering restarts have
   become our primary cause of transaction restarts, on some workloads
   totally 20% of actual transaction commits.

 - To avoid deadlock or livelock, we'd often have to take intent locks
   when we only wanted a read lock: with the lock ordering approach, it
   is actually illegal to hold _any_ read lock while blocking on an intent
   lock, and this has been causing us unnecessary lock contention.

 - It was getting fragile - the various lock ordering rules are not
   trivial, and we'd been seeing occasional livelock issues related to
   this machinery.

So, since bcachefs is already a relational database masquerading as a
filesystem, we're stealing the next traditional database technique and
switching to a cycle detector for avoiding deadlocks.

When we block taking a btree lock, after adding ourself to the waitlist
but before sleeping, we do a DFS of btree transactions waiting on other
btree transactions, starting with the current transaction and walking
our held locks, and transactions blocking on our held locks.

If we find a cycle, we emit a transaction restart. Occasionally (e.g.
the btree split path) we can not allow the lock() operation to fail, so
if necessary we'll tell another transaction that it has to fail.

Result: trans_restart_would_deadlock events are reduced by a factor of
10 to 100, and we'll be able to delete a whole bunch of grotty, fragile
code.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:41 -04:00
Kent Overstreet
62448afee7 bcachefs: Fix bch2_btree_node_upgrade()
Previously, if we were trying to upgrade from a read to an intent lock
but we held an additional read lock via another btree_path,
bch2_btree_node_upgrade() would always fail, in six_lock_tryupgrade().

This patch factors out the code that __bch2_btree_node_lock_write() uses
to temporarily drop extra read locks, so that six_lock_tryupgrade() can
succeed.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2023-10-22 17:09:41 -04:00
Kent Overstreet
845cffed0d bcachefs: Add a debug assert
Chasing down a strange locking bug.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:41 -04:00
Kent Overstreet
84a37cbf62 six locks: Wakeup now takes lock on behalf of waiter
This brings back an important optimization, to avoid touching the wait
lists an extra time, while preserving the property that a thread is on a
lock waitlist iff it is waiting - it is never removed from the waitlist
until it has the lock.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:41 -04:00
Kent Overstreet
e4b7254c75 six locks: Fix a lost wakeup
There was a lost wakeup between a read unlock in percpu mode and a write
lock. The unlock path unlocks, then executes a barrier, then checks for
waiters; correspondingly, the lock side should set the wait bit and
execute a barrier, then attempt to take the lock.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:41 -04:00
Kent Overstreet
5b254da573 six locks: Enable lockdep
Now that we have lockdep_set_no_check_recursion(), we can enable lockdep
checking.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:41 -04:00
Kent Overstreet
f6ea2d575d six locks: Add start_time to six_lock_waiter
This is needed by the cycle detector in bcachefs - we need a way to
iterater over waitlist entries while dropping and retaking the waitlist
lock.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:41 -04:00
Kent Overstreet
0bfb9f42b7 six locks: six_lock_waiter()
This allows passing in the wait list entry - to be used for a deadlock
cycle detector.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:41 -04:00
Kent Overstreet
ebc6f76a66 six locks: Simplify wait lists
This switches to a single list of waiters, instead of separate lists for
read and intent, and switches write locks to also use the wait lists
instead of being handled differently.

Also, removal from the wait list is now done by the process waiting on
the lock, not the process doing the wakeup. This is needed for the new
deadlock cycle detector - we need tasks to stay on the waitlist until
they've successfully acquired the lock.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:40 -04:00
Kent Overstreet
098ef98d5b bcachefs: Add private error codes for ENOSPC
Continuing the saga of introducing private dedicated error codes for
each error path, this patch converts ENOSPC to error codes that are
subtypes of ENOSPC. We've recently had a test failure where we got
-ENOSPC where we shouldn't have, and didn't have enough information to
tell where it came from, so this patch will solve that problem.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:40 -04:00
Kent Overstreet
5c1ef830f6 bcachefs: Errcodes can now subtype standard error codes
The next patch is going to be adding private error codes for all the
places we return -ENOSPC.

Additionally, this patch updates return paths at all module boundaries
to call bch2_err_class(), to return the standard error code.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:40 -04:00
Kent Overstreet
57ce827442 bcachefs: Make an assertion more informative
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:40 -04:00
Kent Overstreet
e4215d0fec bcachefs: All held locks must be in a btree path
With the new deadlock cycle detector, it's critical that all held locks
be marked in a btree_path, because that's what the cycle detector
traverses - any locks that aren't correctly marked will cause deadlocks.

This changes the btree_path to allocate some btree_paths for the new
nodes, since until the final update is done we otherwise don't have a
path referencing them.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:40 -04:00
Kent Overstreet
367d72dd5f bcachefs: bch2_btree_path_upgrade() now emits transaction restart
Centralizing the transaction restart/tracepoint in
bch2_btree_path_upgrade() lets us improve the tracepoint - now it emits
old and new locks_want.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:40 -04:00
Kent Overstreet
b8eec67591 bcachefs: Add a manual trigger for lock wakeups
Spotted a lockup once that appeared to be a lost wakeup. Adding a manual
trigger for lock wakeups will make it easy to tell if that's what it is
next time it occurs.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:40 -04:00
Kent Overstreet
5a82c7c7d1 bcachefs: Fix sb_field_counters formatting
We have counters with longer names now, so adjust the tabstop - also,
make sure there's always a space printed between the name and the
number.

Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22 17:09:40 -04:00