IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Implement a hash table, using cuckoo hashing, for empty buckets that are
waiting on a journal commit before they can be reused.
This replaces the journal_seq field of bucket_mark, and is part of
eventually getting rid of the in memory bucket array.
We may need to make bch2_bucket_needs_journal_commit() lockless, pending
profiling and testing.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
We're trying to track down a bug that shows itself as newly-created
extents having stale dirty pointers - possibly due to the in memory gen
and the btree gen being inconsistent. This patch changes the error
message to also print out the in memory bucket gen when this happens.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
btree_gc sometimes needs another pass when it corrects bucket generation
numbers or data types - when it finds multiple pointers of different
data types to the same bucket, it may want to keep the second one it
found.
When this happens, we now clear out bucket sector counts _without_
resetting the bucket generation/data types that we already found,
instead of resetting them to what we have in the alloc btree.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
The repair for for btree_ptrs was saying one thing and doing another -
fortunately, that code can just be deleted.
Also, when we update a btree node pointer, we also have to update node
in memery, if it exists in the btree node cache - this fixes
bch2_check_fix_ptrs() to do that.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
This reverts commit f95b61228efd04c9c158123da5827c96e9773b29.
It turns out, we're seeing filesystems in the wild end up with
blacklisted btree node bsets - this should not be happening, and until
we understand why and fix it we need to keep this code around.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
- Add a shim uuid_unparse_lower() in the kernel, since %pU doesn't work
in userspace
- We don't need to print the bcachefs: or the filesystem name prefix in
userspace
- Improve a few error messages
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Symbol decoding, via %ps, isn't supported in userspace - this will also
be faster when we're using trans->fn in the fast path, as with the new
BCH_JSET_ENTRY_log journal messages.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
The old code correctly handled the case where we were blacklisting a
range that exactly matched an existing entry, but not the case where the
new range partially overlaps an existing entry.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
This patch converts bch2_sb_validate() and the .validate methods for the
various superblock sections to take printbuf, to which they can print
detailed error messages, including printing the entire section that was
invalid.
This is a great improvement over the previous situation, where we could
only return static strings that didn't have precise information about
what was wrong.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
bch2_trans_commit() can legitimately return -ENOSPC with
BTREE_INSERT_NOFAIL set if BTREE_INSERT_NOWAIT was also set.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Add a field to bch_dev for the dev_t of the underlying block device -
this fixes a null ptr deref in tracepoints.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
With BTREE_ITER_WITH_JOURNAL, there's no longer any restrictions on the
order we have to replay keys from the journal in, and we can also start
up journal reclaim right away - and delete a bunch of code.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
This adds a new btree iterator flag, BTREE_ITER_WITH_JOURNAL, that is
automatically enabled when initializing a btree iterator before journal
replay has completed - it overlays the contents of the journal with the
btree.
This lets us delete bch2_btree_and_journal_walk() and just use the
normal btree iterator interface instead - which also lets us delete a
significant amount of duplicated code.
Note that BTREE_ITER_WITH_JOURNAL is still unoptimized in this patch -
we're redoing the binary search over keys in the journal every time we
call bch2_btree_iter_peek().
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
If we're not running fsck we still want to set BCH_FS_FSCK_DONE, so that
bch2_fsck_err() calls are interpreted as bch2_inconsistent_error()
calls().
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Add a flag to indicate whether a journal replay key has been
overwritten, and set/test it with appropriate btree locks held.
This fixes a race between the allocator - invalidating buckets, and
doing btree updates - and journal replay, which before this patch could
clobber the allocator thread's update with an older version of the key
from the journal.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
This adds a _to_text() pretty printer for journal entries - including
every subtype - which will shortly be used by the 'bcachefs
list_journal' subcommand.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Add a journal entry type for logging messages, and add an option to use
it to log the transaction name - this makes for a very handy debugging
tool, as with it we can use the 'bcachefs list_journal' command to see
not only what updates were done, but what was doing them.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
This adds some missing diagnostics from rare but annoying to debug
runtime allocation failure paths.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
The upcoming BTREE_ITER_WITH_JOURNAL patch will require journal keys to
stay in sorted order, so the btree iterator code can overlay them over
btree keys.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
This converts the error messages in the device add to a better style,
and adds some missing ones.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
There's places where we parse these numbers, and our parsing doesn't
cope with decimals currently - this is a hack to get the device_add path
working again where for the device blocksize there doesn't ever need to
be a decimal.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
If the btree updates pointing to a bucket were never flushed by the
journal before the bucket became empty again, we can reuse the bucket
without a journal flush.
This tweaks the tracking of journal sequence numbers in alloc keys to
implement this optimization: now, we only update the journal sequence
number in alloc keys on transitions to and from empty. When a bucket
becomes empty, we check if we can tell the journal not to flush entries
starting from when the bucket was used.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Since dirty extents can be moved or overwritten, it's not just cached
data that we need the ptr_stale() check in bc2h_read_endio for - this
fixes data checksum errors seen in the tiering ktest tests.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Add bch2_journal_noflush_seq(), for telling the journal that entries
before a given sequence number should not be flushes - to be used by an
upcoming allocator optimization.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
This is to help with diagnosing why the btree node can doesn't seem to
be shrinking - we've had issues in the past with granularity/batch size,
since btree nodes are so big.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
In the recovery path, we scan for old btree nodes if we don't have
certain compat bits set. If we do this, we should be doing it after we
upgraded to the newest on disk format.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
- BTREE_ITER_ALL_SNAPSHOTS flag is required here
- change it to also walk the reflink btree
- change it to accumulate stats for all pointers in an extent
- change it to account for incompressible data
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
bch2_ec_mem_alloc() was only used by GC, and there's no real need to
preallocate the stripes radix tree since we can cope fine with memory
allocation failure when we use the radix tree. This deletes a fair bit
of code, and it's also needed for the upcoming patch because
bch2_btree_iter_peek_prev() won't be working before journal replay
completes (and using it was incorrect previously, as well).
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
The allocator needs to wait until the last update touching a bucket has
been commited before writing to it again. However, the code was checking
against the last dirty journal sequence number, not the last flushed
journal sequence number.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
The main in-memory bucket array is going away, but we'll still need to
keep bucket generations in memory, at least for now - ptr_stale() needs
to be an efficient operation.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Since the main in memory bucket array is going away, we don't want to be
calling bucket() or __bucket() when what we want is the GC in-memory
bucket.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
This is so that the copygc code doesn't have to refer to
bucket_mark.owned_by_allocator - assisting in getting rid of the in
memory bucket array.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Prep work for adding a hash table of open buckets - instead of embedding
a bch_extent_ptr, we need to refer to the bucket directly so that we're
not calling sector_to_bucket() in the hash table lookup code, which has
an expensive divide.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Since metadata version bcachefs_metadata_version_btree_ptr_sectors_written,
we haven't needed the journal seq blacklist mechanism for ignoring
blacklisted btree node writes - we now only need it for ignoring journal
entries that were written after the newest flush journal entry, and then
we only need to keep those blacklist entries around until journal replay
is finished.
That means we can delete the code for scanning btree nodes to GC
journal_seq_blacklist entries.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
This fixes a rare bug when mounting & unmounting RO - flushing a clean
filesystem that never went RO should be a no op.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>