IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
When a reflink pointer points to a missing indirect extent, we replace
it with an error key. Instead of replacing the entire reflink pointer
with an error key, this patch replaces only the missing range with an
error key.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
This improves __bch2_trans_commit - early in the recovery process, when
we're running btree_gc and before we want to go RW, it now uses
bch2_journal_key_insert() to add the update to the list of updates for
journal replay to do, instead of btree_gc having to use separate
interfaces depending on whether we're running at bringup or, later,
runtime.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
We currently need to call bch2_btree_path_peek_slot() multiple times in
the transaction commit path - and some of those need to be updated to
also check the keys from journal replay, too. Let's consolidate this and
stash the key being overwritten in btree_insert_entry.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
We were double-freeing old_buckets and not freeing old_buckets_gens:
also, the code was supposed to free buckets, not old_buckets;
old_buckets is only needed because we have to use rcu_assign_pointer()
instead of swap(), and won't be set if we hit the error path.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Previously, bucket fragmentation was considered to be bucket size -
total amount of live data, both dirty and cached.
This meant that if a bucket was full but only a small amount of data in
it was dirty - the rest cached, we'd get stuck: copygc wouldn't move the
dirty data out of the bucket and the allocator wouldn't be able to
invalidate and drop the cached data.
This changes fragmentation to exclude cached data, so that copygc will
evacuate these buckets and copygc/the allocator will always be able to
make forward progress.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
More prep work for getting rid of the in-memory bucket array: now that
we have BTREE_ITER_WITH_JOURNAL, the allocator code can do ntree lookups
before journal replay is finished, and there's no longer any need for it
to get allocation information from the in-memory bucket array.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Implement a hash table, using cuckoo hashing, for empty buckets that are
waiting on a journal commit before they can be reused.
This replaces the journal_seq field of bucket_mark, and is part of
eventually getting rid of the in memory bucket array.
We may need to make bch2_bucket_needs_journal_commit() lockless, pending
profiling and testing.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
We're trying to track down a bug that shows itself as newly-created
extents having stale dirty pointers - possibly due to the in memory gen
and the btree gen being inconsistent. This patch changes the error
message to also print out the in memory bucket gen when this happens.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
This adds some missing diagnostics from rare but annoying to debug
runtime allocation failure paths.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
If the btree updates pointing to a bucket were never flushed by the
journal before the bucket became empty again, we can reuse the bucket
without a journal flush.
This tweaks the tracking of journal sequence numbers in alloc keys to
implement this optimization: now, we only update the journal sequence
number in alloc keys on transitions to and from empty. When a bucket
becomes empty, we check if we can tell the journal not to flush entries
starting from when the bucket was used.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
bch2_ec_mem_alloc() was only used by GC, and there's no real need to
preallocate the stripes radix tree since we can cope fine with memory
allocation failure when we use the radix tree. This deletes a fair bit
of code, and it's also needed for the upcoming patch because
bch2_btree_iter_peek_prev() won't be working before journal replay
completes (and using it was incorrect previously, as well).
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
The allocator needs to wait until the last update touching a bucket has
been commited before writing to it again. However, the code was checking
against the last dirty journal sequence number, not the last flushed
journal sequence number.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
The main in-memory bucket array is going away, but we'll still need to
keep bucket generations in memory, at least for now - ptr_stale() needs
to be an efficient operation.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Since the main in memory bucket array is going away, we don't want to be
calling bucket() or __bucket() when what we want is the GC in-memory
bucket.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
bch2_journal_key_insert() used to assume that the key passed to it was
allocated with kmalloc(), and on success took ownership. This patch
deletes that behaviour, making it more similar to
bch2_trans_update()/bch2_trans_commit().
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Now that bch2_bucket_alloc_new_fs() isn't looking at bucket marks to
decide what buckets are eligible to allocate, we can clean up the
filesystem initialization and device add paths. Previously, we had to
use ancient code to mark superblock/journal buckets in the in memory
bucket marks as we allocated them, and then zero that out and re-do that
marking using the newer transational bucket mark paths. Now, we can
simply delete the in-memory bucket marking.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
This adds flags for options that must be a power of two (block size and
btree node size), and options that are stored in the superblock as a
power of two (encoded extent max).
Also: options are now stored in memory in the same units they're
displayed in (bytes): we now convert when getting and setting from the
superblock.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
This moves some common code into alloc_mem_to_key(), which translates
from the in-memory format for a bucket to the btree key format.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
This adds a new helper that much like the one we have for inode updates,
that allocates the packed alloc key, packs it and calls
bch2_trans_update.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
We have two radix trees of stripes - one that mirrors some information
from the stripes btree in normal operation, and another that GC uses to
recalculate block usage counts.
The normal one is now only used for finding partially empty stripes in
order to reuse them - the normal stripes radix tree and the GC stripes
radix tree are used significantly differently, so this patch splits them
into separate types.
In an upcoming patch we'll be replacing c->stripes with a btree that
indexes stripes by the order we want to reuse them.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
With snapshots, bch2_trans_update() has to check if we need a whitout,
which can cause a transaction restart, so this is important now.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
When we added the stripe and stripe_redundancy fields to alloc keys, we
neglected to add them to the functions that convert back and forth with
the in-memory types.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
This simplifies the code quite a bit and eliminates an inconsistency - a
given bkey doesn't necessarily translate to a single replicas entry for
disk space accounting.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
This changes the bch2_mark_key() and related paths to take mark lock
where it is needed, instead of taking it in the upper transaction commit
path - by pushing down locking we'll be able to handle fsck errors
locally instead of requiring a separate check in the btree_gc code for
replicas being marked.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
This changes bch2_trans_fs_usage_apply() to handle failure (replicas
entry missing) by reverting the changes it made - meaning we can make
the main transaction commit path a bit slimmer, and perhaps also
simplify some locking in upcoming patches.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Currently, btree triggers are run in natural key order, which presents a
problem for fallocate in INSERT_RANGE mode: since we're moving existing
extents to higher offsets, the trigger for deleting the old extent runs
before the trigger that adds the new extent, potentially leading to
indirect extents being deleted that shouldn't be when the delete causes
the refcount to hit 0.
This changes the order we run triggers so that for a givin btree, we run
all insert triggers before overwrite triggers, nicely sidestepping this
issue.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
The filesystem initialization path first marks superblock and journal
buckets non transactionally, since the btree isn't functional yet. That
path was updating the per-journal-buf percpu counters via
bch2_dev_usage_update(), and updating the wrong set of counters so those
updates didn't get written out until journal entry 4.
The relevant code is going to get significantly rewritten in the future
as we transition away from the in memory bucket array, so this just
hacks around it for now.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Add fields to inode & alloc keys that record the journal sequence number
when they were most recently modified.
For alloc keys, this is needed to know what journal sequence number we
have to flush before the bucket can be reused. Currently this is tracked
in memory, but we'll be getting rid of the in memory bucket array.
For inodes, this is needed for fsync when the inode has been evicted
from the vfs cache. Currently we use a bloom filter per outstanding
journal buf - but that mechanism has been broken since we added the
ability to not issue a flush/fua for every journal write.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
This allows triggers to distinguish between a key entering the btree -
i.e. being called from the trans commit path - vs. being called on a key
that already exists, i.e. by GC.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
This helps to unify the interface between bch2_mark_key() and
bch2_trans_mark_key() - and it also gives access to the journal
reservation and journal seq in the mark_key path.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
- The backpointer that ec_stripe_update_ptrs() uses now needs to include
the snapshot ID, which means we have to change where we add the
backpointer to after getting the snapshot ID for the new extents
- ec_stripe_update_ptrs() needs to be calling bch2_trans_begin()
- improve error message in bch2_mark_stripe()
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
When the old or new key doesn't exist, we should still pass in a deleted
key with the correct pos. This fixes a bug in the ec code, when
bch2_mark_stripe() was looking up the wrong in-memory stripe.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
The way __bch2_mark_reflink_p returns errors was clashing with returning
the number of sectors processed - we weren't returning FSCK_ERR_EXIT
correctly.
Fix this by only using the return code for errors, which actually ends
up simplifying the overall logic.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
When a reflink pointer points to only part of an indirect extent, and
then that indirect extent is fragmented (e.g. by copygc), if the reflink
pointer only points to one of the fragments we leak a reference.
Fix this by storing front/back pad values in reflink pointers - when
inserting reflink pointesr, we initialize them to cover the full range
of the indirect extents we reference.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
When a reflink pointer points to an indirect extent that doesn't exist,
we need to replace it with a KEY_TYPE_error key.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
This patch adds subvolume.c - support for the subvolumes and snapshots
btrees and related data types and on disk data structures. The next
patches will start hooking up this new code to existing code.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
This splits btree_iter into two components: btree_iter is now the
externally visible componont, and it points to a btree_path which is now
reference counted.
This means we no longer have to clone iterators up front if they might
be mutated - btree_path can be shared by multiple iterators, and cloned
if an iterator would mutate a shared btree_path. This will help us use
iterators more efficiently, as well as slimming down the main long lived
state in btree_trans, and significantly cleans up the logic for iterator
lifetimes.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
This fix replaces multiple 64 bit divisions with do_div() equivalents.
Signed-off-by: Brett Holman <bholman.devel@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
DIV_ROUND_UP() wasn't doing what we wanted when passing it negative
numbers - fix it by just not passing it negative numbers anymore.
Also, no need to do the scaling by compression ratio for incompressible
data.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
- We no longer mark subsets of extents, they're marked like regular
keys now - which means we can drop the offset & sectors arguments
to trigger functions
- Drop other arguments that are no longer needed anymore in various
places - fs_usage
- Drop the logic for handling extents in bch2_mark_update() that isn't
needed anymore, to match bch2_trans_mark_update()
- Better logic for hanlding the BTREE_ITER_CACHED_NOFILL case, where we
don't have an old key to mark
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Adding iter->should_be_locked introduced a regression where it ended up
not being set on the iterator passed to bch2_btree_update_start(), which
is definitely not what we want.
This patch requires it to be set when calling bch2_trans_update(), and
adds various fixups to make that happen.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Now that bch2_btree_iter_peek_with_updates() has been removed in favor
of BTREE_ITER_WITH_UPDATES, we need to make sure it's not used where we
don't want it.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
This lifts handling of overlapping extents out of __bch2_trans_commit()
and moves it to where we first do the update - which means that
BTREE_ITER_WITH_UPDATES can now work correctly in extents mode.
Also, this patch reworks how extent triggers work: previously, on
partial extent overwrite we would pass this information to the trigger,
telling it what part of the extent was being overwritten. But, this
approach has had too many subtle corner cases - now, we only mark whole
extents, meaning on partial extent overwrite we unmark the old extent
and mark the new extent.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>