IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
When a reflink pointer points to a missing indirect extent, we replace
it with an error key. Instead of replacing the entire reflink pointer
with an error key, this patch replaces only the missing range with an
error key.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
This improves __bch2_trans_commit - early in the recovery process, when
we're running btree_gc and before we want to go RW, it now uses
bch2_journal_key_insert() to add the update to the list of updates for
journal replay to do, instead of btree_gc having to use separate
interfaces depending on whether we're running at bringup or, later,
runtime.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
This patch was originally to work around the journal geting stuck in
nochanges mode - but that was just a hack, we needed to fix the actual
bug. It should be fixed now, so revert it.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
The journal can get stuck if we need to get a journal reservation for
something we have a pre-reservation for, but aren't able to reclaim
space, or if the pin fifo is full - it's impractical to resize the pin
fifo at runtime.
Previously, we reserved 8 entries in the pin fifo for pre-reservations,
but that seems small - we're seeing the journal occasionally get stuck.
Let's reserve a quarter of it.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
BTREE_NODE_SEQ() is supposed to give us a time ordering of btree nodes
on disk, so that we can tell which btree node is newer if we ever have
to scan the entire device to find btree nodes.
The btree node merge path wasn't setting it correctly on the new node -
oops.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Long ago it was possible to get a journal reservation and not use it,
but that's no longer allowed, which means journal_write_compact() has
very little work to do, and isn't really worth the code anymore.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
This helps with lock contention in the journalling code: instead of
updating our journal pin on every write, only get a journal pin if we
don't have one.
This means we can avoid hammering on journal locks nearly so much, at
the cost of carrying around a journal pin for an older entry than the
one we actually need. To handle that, if needed we update our journal
pin to the correct one when flushed by journal reclaim.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Now, when outputting to printbufs, we can set tabstops and left or right
justify text to them - this is to be used by the userspace 'bcachefs fs
usage' command.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
In move_read_endio, we were checking if the next pending write has its
read completed - but this can turn after a use after free (and we were
accessing the list without a lock), so instead just better to just
unconditionally do the wakeup.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
This patch improves the superblock .to_text() methods and adds methods
for all types that were missing them. It also improves printbufs by
allowing them to specfiy what units we want to be printing in, and adds
new wrapper methods for unifying our kernel and userspace environments.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
bch_scnmemcpy was for printing length-limited strings that might not
have a terminating null - turns out sprintf & pr_buf can do this with
%.*s.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
When the nochanges option is selected, we're supposed to never issue
writes. Unfortunately, it seems discards were missed when implemnting
this, leading to some painful filesystem corruption.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Add an option that tells recovery to only read the journal, to be used
by the list_journal command.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
This is prep work for the next patch, which is going to change
__bch2_trans_commit() to use bch2_journal_key_insert() when very early
in the recovery process, so that we have a unified interface for doing
btree updates.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
When viewing what's in the journal, it's more useful to have the logical
location - journal bucket and offset within that bucket - than just the
offset on that device.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Apparently it actually is possible for crypto_skcipher_encrypt() to
return an error - not sure why that would be - but we need to replace
our assertion with actual error handling.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
The error code when we fail to allocate a node in the btree node cache
doesn't make it to bch2_btree_path_traverse_all(). Instead, we need to
stash a flag in btree_trans so we know we have to take the cannibalize
lock.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
bch2_dev_lookup() is used from the extended attribute set methods, for
setting the target options, where we're already holding an inode lock -
it turns out pathname lookups also take inode locks, so that was
susceptible to deadlocks.
Fortunately we already stash the device name in ca->name. This does
change user-visible behaviour though: instead of specifying e.g.
/dev/sda1, user must now specify sda1.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Before we had dedicated gc code for bucket->oldest_gen this was
btree_gc's responsibility, but now that we have that we can rip it out,
simplifying the already overcomplicated btree_gc.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
This improves the formatting of journal_entry_btree_keys_to_text() by
putting each key on its own line.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
The loop that traverses paths in traverse_all() needs to be a little bit
tricky, because traversing a path can cause other paths to be added (or
perhaps removed) at about the same position.
The old logic was buggy, replace it with simpler logic.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Some of our tracepoints were calling snprintf("pS") - which does symbol
table lookups - in TP_fast_assign(), which turns out to be a really bad
idea.
This was done because perf trace wasn't correctly printing tracepoints
that use %pS anymore - but it turns out trace-cmd does handle it
correctly.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Since we retry reads when we discover we read from a pointer that went
stale, if a dirty pointer is erroniously stale it would cause us to loop
retrying that read forever - unless we check before issuing the read,
while the btree is still locked, when we know that a dirty pointer
should never be stale.
This patch adds that check, along with printing some helpful debug info.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
__bch2_btree_node_lock() was implementing the wrong lock ordering for
cached vs. non cached paths - this fixes it to match the btree path sort
order as defined by __btree_path_cmp(), and also simplifies the code
some.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
This consolidates some of the btree node lock path, so that when we're
blocked taking a write lock on a node it shows up in
bch2_btree_trans_to_text(), along with intent and read locks.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
We were emitting two trace events on transaction restart in this code
path - delete the redundant one.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
We need to ensure we don't have any btree locks held when calling
do_pending_writes() - besides issuing IOs, upcoming allocator changes
will have allocations doing btree lookups directly.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
The old .debugcheck methods are no more and this just calls the .invalid
method, which doesn't add much since we already check that when doing
btree updates and when reading metadata in.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
The check_dirents pass handles transaction restarts at the toplevel -
check_subdir_count() was incorrectly handling transaction restarts
without returning -EINTR, meaning that the iterator pointing to the
dirent being checked was left invalid.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
The reflink repair code was incorrectly inserting a nonzero deleted key
via journal replay - this is due to bch2_journal_key_insert() being
somewhat hacky, and so this fix is also hacky for now.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Like the previous patches, this converts bch2_gc_gens() to use the alloc
btree directly, and private arrays of generation numbers for its own
recalculation of oldest_gen.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
This converts the copygc code to use the alloc btree directly to find
buckets that need to be evacuated instead of the in-memory bucket array,
which is finally going away soon.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
This changes the btree_gc code to only use the second bucket array, the
one dedicated to GC. On completion, it compares what's in its in memory
bucket array to the allocation information in the btree and writes it
directly, instead of updating the main in-memory bucket array and
writing that.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
- Updates to non key cache iterators will now be transparently
redirected to the key cache for cached btrees.
- Except when creating new keys: then the update goes to underlying
btree
For for iterating over a cached btree to work, we need to ensure that if
a key exists in the key cache, it also exists in the btree - otherwise
the iterator code will skip past it and not check the key cache.
Otherwise, for consistency, all updates should go to the same place -
the key cache.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
This is the start of cache coherency with the btree key cache - this
adds a btree iterator flag that causes lookups to also check the key
cache when we're iterating over the btree (not iterating over the key
cache).
Note that we could still race with another thread creating at item in
the key cache and updating it, since we aren't holding the key cache
locked if it wasn't found. The next patch for the update path will
address this by causing the transaction to restart if the key cache is
found to be dirty.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Previously, when doing updates and running triggers before journal
replay completes, triggers would see the incorrect key for the old key
being overwritten - this patch updates the trigger code to check the
journal keys when necessary, needed for the upcoming allocator rewrite.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
We currently need to call bch2_btree_path_peek_slot() multiple times in
the transaction commit path - and some of those need to be updated to
also check the keys from journal replay, too. Let's consolidate this and
stash the key being overwritten in btree_insert_entry.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Add a new helper that returns true if the given btree ID uses the btree
key cache. This enables some new cleanups, since the helper can check
the options for whether caching is enabled on a given btree.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
btree_key_cache_flush_pos() uses BTREE_ITER_CACHED_NOFILL - but it
wasn't checking for !ck->valid. It does check for the entry being dirty,
so it shouldn't matter, but this refactor it a bit and adds and
assertion.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
We were double-freeing old_buckets and not freeing old_buckets_gens:
also, the code was supposed to free buckets, not old_buckets;
old_buckets is only needed because we have to use rcu_assign_pointer()
instead of swap(), and won't be set if we hit the error path.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
These nodes aren't reachable by other threads, so there's no need to
keep it locked - and this fixes a bug with the assertion in
bch2_trans_unlock() firing on transaction restart.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Change the error messages in bch2_inconsistent_error() and
bch2_fatal_error() so we can distinguish them.
Also, prefer bch2_fs_fatal_error() (which also logs an error message) to
bch2_fatal_error(), and change a call to bch2_inconsistent_error() to
bch2_fatal_error() when we can't continue.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Previously, bucket fragmentation was considered to be bucket size -
total amount of live data, both dirty and cached.
This meant that if a bucket was full but only a small amount of data in
it was dirty - the rest cached, we'd get stuck: copygc wouldn't move the
dirty data out of the bucket and the allocator wouldn't be able to
invalidate and drop the cached data.
This changes fragmentation to exclude cached data, so that copygc will
evacuate these buckets and copygc/the allocator will always be able to
make forward progress.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>