IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
This patch adds subvolume.c - support for the subvolumes and snapshots
btrees and related data types and on disk data structures. The next
patches will start hooking up this new code to existing code.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Instead of unconditionally upgrading read locks to intent locks in
do_bch2_trans_commit(), this patch changes the path that takes write
locks to first trylock, and then if trylock fails check if we have a
conflicting read lock, and restart the transaction if necessary.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
This is a new approach to avoiding the self deadlock we'd get if we
tried to take a write lock on a node while holding a read lock - we
simply upgrade the readers to intent.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
This patch significantly reduces the number of btree lookups required in
the extent update path.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Since btree_path is now internally refcounted, we don't need to clone an
iterator before calling bch2_trans_update() if we'll be mutating it.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
This splits btree_iter into two components: btree_iter is now the
externally visible componont, and it points to a btree_path which is now
reference counted.
This means we no longer have to clone iterators up front if they might
be mutated - btree_path can be shared by multiple iterators, and cloned
if an iterator would mutate a shared btree_path. This will help us use
iterators more efficiently, as well as slimming down the main long lived
state in btree_trans, and significantly cleans up the logic for iterator
lifetimes.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
When we detect an invalid key being inserted, we should print what code
was doing the update.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
We really only need to distinguish between btree iterators and btree key
cache iterators - this is more prep work for btree_path.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
These utility functions are for managing btree node state within a
btree_trans - rename them for consistency, and drop some unneeded
arguments.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
This is prep work for splitting btree_path out from btree_iter -
btree_path will not have a pointer to btree_trans.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
BTREE_ITER_SET_POS_AFTER_COMMIT is used internally to automagically
advance extent btree iterators on sucessful commit.
But with the upcomnig btree_path patch it's getting more awkward to
support, and it adds overhead to core data structures that's only used
in a few places, and can be easily done by the caller instead.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
This consolidates the code for doing extent updates, and makes the btree
iterator usage a bit cleaner and more efficient.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
This factors out bch2_dump_trans_iters_updates() from the iter alloc
overflow path, and makes some small improvements to what it prints.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Inode creation is done with non-cached btree iterators, but then in the
same transaction the inode may be updated again with a cached iterator -
it makes cache coherency easier if new inodes always land in the
underlying btree.
This patch adds a check to bch2_trans_update() - if the same key is
updated multiple times in the same transaction with both cached and non
cache iterators, use the non cached iterator.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
With the recent transaction restart changes, it's no longer needed - all
transaction commits have BTREE_INSERT_NOUNLOCK semantics.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Start tracking when btree transactions have been restarted - and assert
that we're always calling bch2_trans_begin() immediately after
transaction restart.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
They should already be traversed, and we're asserting that since the
introduction of iter->should_be_locked
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
This closes a significant hole (and last known hole) in our ability to
verify metadata. Previously, since btree nodes are log structured, we
couldn't detect lost btree writes that weren't the first write to a
given node. Additionally, this seems to have lead to some significant
metadata corruption on multi device filesystems with metadata
replication: since a write may have made it to one device and not
another, if we read that btree node back from the replica that did have
that write and started appending after that point, the other replica
would have a gap in the bset entries and reading from that replica
wouldn't find the rest of the bsets.
But, since updates to interior btree nodes are now journalled, we can
close this hole by updating pointers to btree nodes after every write
with the currently written number of sectors, without negatively
affecting performance. This means we will always detect lost or corrupt
metadata - it also means that our btree is now a curious hybrid of COW
and non COW btrees, with all the benefits of both (excluding
complexity).
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
btree_trans should always be passed when we have one - iter->trans is
disfavoured. This mainly updates old code in btree_update_interior.c,
some of which predates btree_trans.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Add a new flag to control assertions about updating to internal snapshot
nodes, that normally should not be written to - to be used in an
upcoming patch.
Also do some renaming - trigger_flags is now update_flags.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
- We no longer mark subsets of extents, they're marked like regular
keys now - which means we can drop the offset & sectors arguments
to trigger functions
- Drop other arguments that are no longer needed anymore in various
places - fs_usage
- Drop the logic for handling extents in bch2_mark_update() that isn't
needed anymore, to match bch2_trans_mark_update()
- Better logic for hanlding the BTREE_ITER_CACHED_NOFILL case, where we
don't have an old key to mark
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Adding iter->should_be_locked introduced a regression where it ended up
not being set on the iterator passed to bch2_btree_update_start(), which
is definitely not what we want.
This patch requires it to be set when calling bch2_trans_update(), and
adds various fixups to make that happen.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
With trans->updates2 gone, we can now drop this helper and use
bch2_btree_delete_at() instead.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
We haven't had extent merging in quite some time. It used to be done by
the btree code when sorting btree nodes, but that was eliminated as part
of the work to separate extent handling from core btree code.
This patch re-implements extent merging in the transaction commit path.
We don't currently have the ability to merge reflink pointers, we need
to do some work on the triggers code to be able to do that without
ending up with incorrect refcounts.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Now that extent handling has been lifted to bch2_trans_update(), we
don't need to keep two different lists of updates.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
This lifts handling of overlapping extents out of __bch2_trans_commit()
and moves it to where we first do the update - which means that
BTREE_ITER_WITH_UPDATES can now work correctly in extents mode.
Also, this patch reworks how extent triggers work: previously, on
partial extent overwrite we would pass this information to the trigger,
telling it what part of the extent was being overwritten. But, this
approach has had too many subtle corner cases - now, we only mark whole
extents, meaning on partial extent overwrite we unmark the old extent
and mark the new extent.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
This drops bch2_btree_iter_peek_with_updates() and replaces it with a
new flag, BTREE_ITER_WITH_UPDATES, and also reworks
bch2_btree_iter_peek_slot() to respect it too.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
This patch adds some new tracepoints to the btree iterator code, and
adds new fields to the existing tracepoints - primarily for the iterator
position.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Upcoming refactoring is going to change bch2_trans_update() to start
returning transaction restarts.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Buffered writes may have to increase their disk reservation at btree
update time, due to compression and erasure coding being unpredictable:
O_DIRECT writes should be checking for -ENOSPC, but buffered writes have
already been accepted and should not.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Currently, we handle multiple overlapping extents in the same
transaction commit by doing fixups in bch2_trans_update() - this patch
extents that to split updates when necessary. The next patch that
changes the reflink code to not fragment extents when making them
indirect will require this.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
It was being skipped when hole punching, leading to problems when
splitting compressed extents.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
bch2_varint_decode() can read up to 7 bytes past the end of the buffer,
which means we need to allocate slightly larger key cache buffers.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Using the normal transaction commit path to insert and journal updates
to interior nodes hadn't been done before this repair code was written,
not surprising that there was a bug.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
We check for this prior to metadata being written, but we're seeing some
strange bugs lately, and this will help catch those closer to where they
occur.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Can't run arbitrary code inside a wait_event() conditional, due to
task state being weird...
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>