IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
The calculation for number of nodes to allocate in
bch2_btree_update_start() was incorrect - this fixes a BUG_ON() on the
small nodes test.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
since we currently don't have a good fault injection library,
bch2_btree_insert_node() was randomly injecting faults based on
local_clock().
At the very least this should have been a debug mode only thing, but
this is a brittle method so let's just delete it.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Using drop_locks_do() ensures that every unlock() is paired with a
relock(), with proper error checking.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Add a new helper for the common pattern of:
- trans_unlock()
- do something
- trans_relock()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
GFP_NOIO dates from the bcache days, when we operated under the block
layer. Now, GFP_NOFS is more appropriate, so switch all GFP_NOIO uses to
GFP_NOFS.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
As suggested by Linus, this drops the six_lock_state union in favor of
raw bitmasks.
On the one hand, bitfields give more type-level structure to the code.
However, a significant amount of the code was working with
six_lock_state as a u64/atomic64_t, and the conversions from the
bitfields to the u64 were deemed a bit too out-there.
More significantly, because bitfield order is poorly defined (#ifdef
__LITTLE_ENDIAN_BITFIELD can be used, but is gross), incrementing the
sequence number would overflow into the rest of the bitfield if the
compiler didn't put the sequence number at the high end of the word.
The new code is a bit saner when we're on an architecture without real
atomic64_t support - all accesses to lock->state now go through
atomic64_*() operations.
On architectures with real atomic64_t support, we additionally use
atomic bit ops for setting/clearing individual bits.
Text size: 7467 bytes -> 4649 bytes - compilers still suck at
bitfields.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Seeing occasional test failures where we get stuck in a livelock that
involves this event - this will help track it down.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
When we allocate disk space, we need to be incrementing the WRITE io
clock, which perhaps should be renamed to sectors allocated - copygc
uses this io clock to know when to run.
Also, we should be incrementing the same clock when allocating btree
nodes.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
This adds private error codes for most (but not all) of our ENOMEM uses,
which makes it easier to track down assorted allocation failures.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Rust bindgen doesn't cope well with anonymous structs and unions. This
patch drops the fancy anonymous structs & unions in bkey_i that let us
use the same helpers for bkey_i and bkey_packed; since bkey_packed is an
internal type that's never exposed to outside code, it's only a minor
inconvenienc.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Rust bindgen doesn't do anonymous structs very nicely: BKEY_PADDED()
only needs the anonymous struct when it's used on the stack, to
guarantee layout, not when it's embedded in another struct.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Soon, __bch2_btree_node_write() is going to require a btree_trans: zoned
device support is going to require a new allocation for every btree node
write. This is a bit of prep work.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
This brings back journal_entries_compact(), but in a more efficient form
- we need to do multiple postprocess steps, so iterate over the
journal entries being written just once to make it more efficient.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
This adds a debug mode where we split up the c->writes refcount into
distinct refcounts for every codepath that takes a reference, and adds
sysfs code to print the value of each ref.
This will make it easier to debug shutdown hangs due to refcount leaks.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
The btree_node_write_blocked bit was a later addition to this code,
it only mirrors the state of the b->write_blocked list (empty or
nonempty) - unfortunately, when it was added it wasn't correctly kept in
sync - oops.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
In debug mode, we now track where btree iterators and paths are
initialized/allocated - helpful in tracking down btree path overflows.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
This patch introduces
- bpos_eq()
- bpos_lt()
- bpos_le()
- bpos_gt()
- bpos_ge()
and equivalent replacements for bkey_cmp().
Looking at the generated assembly these could probably be improved
further, but we already see a significant code size improvement with
this patch.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
b->write_type needs to be set atomically with setting the
btree_node_need_write flag, so move it into b->flags.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
This improves the bkey_format calculation when splitting btree nodes.
Previously, we'd use a format calculated for the original node for the
lower of the two new nodes.
This was particularly bad on sequential insertions, where we iteratively
split the last btree node, whos format has to include KEY_MAX.
Now, we calculate formats precisely for the keys the two new nodes will
contain. This also should make splitting a bit more efficient, since
we're only copying keys once (from the original node to the new node,
instead of new node, replacement node, then upper split).
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
This replaces sysfs btree_avg_write_size with btree_write_stats, which
now breaks out statistics by the source of the btree write.
Btree writes that are too small are a source of inefficiency, and
excessive btree resort overhead - this will let us see what's causing
them.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Use __func__ in error messages that refer to function name, and do so
more uniformly - suggested by checkpatch.pl
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
checkpatch.pl gives lots of warnings that we don't want - suggested
ignore list:
ASSIGN_IN_IF
UNSPECIFIED_INT - bcachefs coding style prefers single token type names
NEW_TYPEDEFS - typedefs are occasionally good
FUNCTION_ARGUMENTS - we prefer to look at functions in .c files
(hopefully with docbook documentation), not .h
file prototypes
MULTISTATEMENT_MACRO_USE_DO_WHILE
- we have _many_ x-macros and other macros where
we can't do this
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
btree nodes can be written by other threads (shrinker, journal reclaim)
with only a read lock, but brand new nodes should only be written by the
thread doing the split/interior update. bch2_btree_update_add_new_node()
sets btree node flags to indicate that this is a new node and should not
be written out by other threads, thus we need to call it before dropping
our write lock.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Previously, bch2_btree_update_start() would always take all intent
locks, all the way up to the root.
We've finally got data from users where this became a scalability issue
- so, this patch fixes bch2_btree_update_start() to only take the locks
we need.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Now that we have an error path plumbed through, there's no need to be
using bch2_btree_node_lock_write_nofail().
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
The next patch in the series is (finally!) going to change btree splits
(and interior updates in general) to not take intent locks all the way
up to the root - instead only locking the nodes they'll need to modify.
However, this will be introducing a race since if we're not holding a
write lock on a btree node it can be written out by another thread, and
then we might not have enough space for a new bset entry.
We can handle this by retrying - we just need to introduce a new error
path.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
In order to avoid locking all btree nodes up to the root for btree node
splits, we're going to have to introduce a new error path into
bch2_btree_insert_node(); this mean we can't have done any writes or
modified global state before that point.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
btree_node_lock_nopath() is something we'd like to get rid of, it's
always prone to deadlocks if we accidentally are holding other locks,
because it doesn't mark the lock it's taking in a path: we'll want to
get rid of it in the future, but for now this patch works it by calling
bch2_trans_unlock().
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
This function is fairly small and only used in two places: one very hot,
the other cold, so it should definitely be inlined.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
This is a major oopsy - we should always be unlocking before calling
closure_sync(), else we'll cause a deadlock.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
We were checking for -EAGAIN, but we're not returned that when we didn't
pass a closure to wait with - oops.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
With the new deadlock cycle detector, it's critical that all held locks
be marked in a btree_path, because that's what the cycle detector
traverses - any locks that aren't correctly marked will cause deadlocks.
This changes the btree_path to allocate some btree_paths for the new
nodes, since until the final update is done we otherwise don't have a
path referencing them.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Centralizing the transaction restart/tracepoint in
bch2_btree_path_upgrade() lets us improve the tracepoint - now it emits
old and new locks_want.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
With the upcoming cycle detector, we have to be careful about using
btree_node_lock_nopath - in particular, using it to take write locks can
cause deadlocks.
All held locks need to be tracked in a btree_path, so that the cycle
detector knows about them - unless we know that we cannot cause
deadlocks for other reasons: e.g. we are only taking read locks, or
we're in very early fsck (topology repair).
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Ideally, all the code in btree_locking.c should be converted, but then
we'd want to convert btree_path to point to btree_key_cached_common too,
and then we'd be in for a much bigger cleanup - but a bit of incremental
cleanup will still be helpful for the next patches.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Taking a write lock will be able to fail, with the new cycle detector -
unless we pass it nofail, which is possible but not preferred.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
In the future, with the new deadlock cycle detector, we won't be using
bare six_lock_* anymore: lock wait entries will all be embedded in
btree_trans, and we will need a btree_trans context whenever locking a
btree node.
This patch plumbs a btree_trans to the few places that need it, and adds
two new locking functions
- btree_node_lock_nopath, which may fail returning a transaction
restart, and
- btree_node_lock_nopath_nofail, to be used in places where we know we
cannot deadlock (i.e. because we're holding no other locks).
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Also, do some reorganizing/renaming, convert atomic counters in bch_fs
to persistent counters, and add a few missing counters.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
On failure to get a journal pre-reservation because we're called from
journal reclaim we're not supposed to return a transaction restart error
- this fixes a livelock.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Our types are exported to the tracepoint code, so it's not necessary to
break things out individually when passing them to tracepoints - we can
also call other functions from TP_fast_assign().
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>