IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
btree_node_write_if_need() kicks off a btree node write only if
need_write is set; this makes the locking easier to reason about by
moving the check into the cmpxchg loop in __bch2_btree_node_write().
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
There was a rare recursive locking bug, in __bch2_btree_node_write()
nowrite path -> btree_node_write_done(), in the path that kicks off
another write.
This splits out an inner __btree_node_write_done() that expects to be
run with the btree node lock held.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
In sysfs, files can only output at most PAGE_SIZE. This is a problem for
debug info that needs to list an arbitrary number of times, and because
of this limit some of our debug info has been terser and harder to read
than we'd like.
This patch moves info about journal pins and cached btree nodes to
debugfs, and greatly expands and improves the output we return.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
This cacheline aligns struct journal, and puts j->reservations and
j->prereserved on their own cacheline - we may want to split them up in
a separate patch.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
In btree_update_interior.c, we were changing a path's level directly -
which affects path sort order - without re-sorting paths, leading to
assertions when bch2_path_get() verified paths were sorted correctly.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
journal_flush_done() was overwriting did_work, thus occasionally
returning false when it did do work and occasional assertions in the
shutdown sequence because we didn't completely flush the key cache.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
This patch changes printbufs dynamically allocate and reallocate a
buffer as needed. Stack usage has become a bit of a problem, and a major
cause of that has been static size string buffers on the stack.
The most involved part of this refactoring is that printbufs must now be
exited with printbuf_exit().
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
We're hitting a strange bug with transaction paths not being sorted
correctly - this dumps transaction paths in the order we thought was
sorted, which will hopefully shed some light as to what's going on.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
When key cache pins were put onto their own list, we neglected to update
bch2_journal_pins_to_text() to print them.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
bch2_trans_begin() invalidates all iterators, until they're revalidated
by calling peek() or traverse().
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Triggers can generate additional btree updates - we need to run alloc
triggers after all other triggers have run, because they generate
updates for the alloc btree.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Upcoming patches are doing more work on the triggers code, this patch
just moves code around.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
We're now coming up with triggers that modify the update being done. A
bkey_s_c is const - bkey_i is the correct type to be using here.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
vstruct_bytes() was returning a u64 - it should be a size_t, the corect
type for the size of anything that fits in memory.
Also replace a 64 bit divide with div_u64().
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
On btree node read error, it's helpful to see what we were trying to
read - was it all zeroes?
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
When a reflink pointer points to a missing indirect extent, we replace
it with an error key. Instead of replacing the entire reflink pointer
with an error key, this patch replaces only the missing range with an
error key.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
This improves __bch2_trans_commit - early in the recovery process, when
we're running btree_gc and before we want to go RW, it now uses
bch2_journal_key_insert() to add the update to the list of updates for
journal replay to do, instead of btree_gc having to use separate
interfaces depending on whether we're running at bringup or, later,
runtime.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
This patch was originally to work around the journal geting stuck in
nochanges mode - but that was just a hack, we needed to fix the actual
bug. It should be fixed now, so revert it.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
The journal can get stuck if we need to get a journal reservation for
something we have a pre-reservation for, but aren't able to reclaim
space, or if the pin fifo is full - it's impractical to resize the pin
fifo at runtime.
Previously, we reserved 8 entries in the pin fifo for pre-reservations,
but that seems small - we're seeing the journal occasionally get stuck.
Let's reserve a quarter of it.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
BTREE_NODE_SEQ() is supposed to give us a time ordering of btree nodes
on disk, so that we can tell which btree node is newer if we ever have
to scan the entire device to find btree nodes.
The btree node merge path wasn't setting it correctly on the new node -
oops.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Long ago it was possible to get a journal reservation and not use it,
but that's no longer allowed, which means journal_write_compact() has
very little work to do, and isn't really worth the code anymore.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
This helps with lock contention in the journalling code: instead of
updating our journal pin on every write, only get a journal pin if we
don't have one.
This means we can avoid hammering on journal locks nearly so much, at
the cost of carrying around a journal pin for an older entry than the
one we actually need. To handle that, if needed we update our journal
pin to the correct one when flushed by journal reclaim.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Now, when outputting to printbufs, we can set tabstops and left or right
justify text to them - this is to be used by the userspace 'bcachefs fs
usage' command.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
In move_read_endio, we were checking if the next pending write has its
read completed - but this can turn after a use after free (and we were
accessing the list without a lock), so instead just better to just
unconditionally do the wakeup.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
This patch improves the superblock .to_text() methods and adds methods
for all types that were missing them. It also improves printbufs by
allowing them to specfiy what units we want to be printing in, and adds
new wrapper methods for unifying our kernel and userspace environments.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
bch_scnmemcpy was for printing length-limited strings that might not
have a terminating null - turns out sprintf & pr_buf can do this with
%.*s.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
When the nochanges option is selected, we're supposed to never issue
writes. Unfortunately, it seems discards were missed when implemnting
this, leading to some painful filesystem corruption.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Add an option that tells recovery to only read the journal, to be used
by the list_journal command.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
This is prep work for the next patch, which is going to change
__bch2_trans_commit() to use bch2_journal_key_insert() when very early
in the recovery process, so that we have a unified interface for doing
btree updates.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
When viewing what's in the journal, it's more useful to have the logical
location - journal bucket and offset within that bucket - than just the
offset on that device.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Apparently it actually is possible for crypto_skcipher_encrypt() to
return an error - not sure why that would be - but we need to replace
our assertion with actual error handling.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
The error code when we fail to allocate a node in the btree node cache
doesn't make it to bch2_btree_path_traverse_all(). Instead, we need to
stash a flag in btree_trans so we know we have to take the cannibalize
lock.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
bch2_dev_lookup() is used from the extended attribute set methods, for
setting the target options, where we're already holding an inode lock -
it turns out pathname lookups also take inode locks, so that was
susceptible to deadlocks.
Fortunately we already stash the device name in ca->name. This does
change user-visible behaviour though: instead of specifying e.g.
/dev/sda1, user must now specify sda1.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Before we had dedicated gc code for bucket->oldest_gen this was
btree_gc's responsibility, but now that we have that we can rip it out,
simplifying the already overcomplicated btree_gc.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
This improves the formatting of journal_entry_btree_keys_to_text() by
putting each key on its own line.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
The loop that traverses paths in traverse_all() needs to be a little bit
tricky, because traversing a path can cause other paths to be added (or
perhaps removed) at about the same position.
The old logic was buggy, replace it with simpler logic.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Some of our tracepoints were calling snprintf("pS") - which does symbol
table lookups - in TP_fast_assign(), which turns out to be a really bad
idea.
This was done because perf trace wasn't correctly printing tracepoints
that use %pS anymore - but it turns out trace-cmd does handle it
correctly.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Since we retry reads when we discover we read from a pointer that went
stale, if a dirty pointer is erroniously stale it would cause us to loop
retrying that read forever - unless we check before issuing the read,
while the btree is still locked, when we know that a dirty pointer
should never be stale.
This patch adds that check, along with printing some helpful debug info.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
__bch2_btree_node_lock() was implementing the wrong lock ordering for
cached vs. non cached paths - this fixes it to match the btree path sort
order as defined by __btree_path_cmp(), and also simplifies the code
some.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
This consolidates some of the btree node lock path, so that when we're
blocked taking a write lock on a node it shows up in
bch2_btree_trans_to_text(), along with intent and read locks.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
We were emitting two trace events on transaction restart in this code
path - delete the redundant one.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>