IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
With various newer key types - stripe keys, inline data extents - the
old approach of calculating the maximum size of the value is becoming
more and more error prone. Better to switch to bkey_on_stack, which can
dynamically allocate if necessary to handle any size bkey.
In particular we also want to get rid of BKEY_EXTENT_VAL_U64s_MAX.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Allocations for copygc have to be kept separate from everything else,
so that copygc doesn't get starved.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
We don't want writes to not get erasure coded just because the allocator
temporarily wasn't keeping up.
However, it's not guaranteed that these allocations will ever succeed,
we can currently get stuck - especially if devices are different sizes -
we still have work to do in this area.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Previously, in the stripe creation path, when reusing an existing stripe
we'd read the existing stripe synchronously - ouch.
Now, we allocate two stripe bufs if we're using an existing stripe, so
that we can do the read asynchronously - and, we read the full stripe so
that we can run recovery, if necessary.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Previously, we were using BTREE_INSERT_RESERVE in a lot of places where
it no longer makes sense.
- we now have more open_buckets than we used to, and the reserves work
better, so we shouldn't need to use BTREE_INSERT_RESERVE just because
we're holding open_buckets pinned anymore.
- We have the btree key cache for updates to the alloc btree, meaning
we no longer need the btree reserve to ensure the allocator can make
forward progress.
This means that we should only need a reserve for btree updates to
ensure that copygc can make forward progress.
Since it's now just for copygc, we can also fold RESERVE_BTREE into
RESERVE_MOVINGGC (the allocator's freelist reserve).
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
This now means "the maximum number of pointers within a bkey" - and
bch_devs_list is updated to use it instead of BCH_REPLICAS_MAX, since
stripes can contain more than BCH_REPLICAS_MAX pointers.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
The stripe creation path was too state-machiney: it would always run the
full state machine until it had succesfully created a new stripe.
But if we tried to get and reuse an existing stripe after we'd already
allocated some buckets, the buckets we'd allocated might have conflicted
with the blocks in the existing stripe we need to keep - oops.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
it's useful to know whether an error was for a read or a write - this
also standardizes error messages a bit more.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
PAGE_SIZE and size_t are not unsigned longs on 32 bit, annoying...
also switch to atomic64_cmpxchg instead of cmpxchg() for
journal_seq_copy, as atomic64_cmpxchg has a fallback that uses spinlocks
for when it's not supported.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Instead of trying to charge EC parity to the data within the stripe
(which is subject to rounding errors), let's charge it to the stripe
itself. It should also make -ENOSPC issues easier to deal with if we
charge for parity blocks up front, and means we can also make more fine
grained accounting available to the user.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Now that we've got transactional alloc info updates (and have for
awhile), we don't need to write it out on shutdown, and we don't need to
write it out on startup except when GC found errors - this is a big
improvement to mount/unmount performance.
This patch also fixes a few bugs where we weren't writing out alloc
info (on new filesystems, and new devices) and should have been.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
On large filesystems reading in the alloc info takes a significant
amount of time. But we don't need to be calling into the fully general
bch2_mark_key() path, just open code what we need in
bch2_alloc_read_fn().
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
painful looking typo, fortunately difficult to hit.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
printbufs know how big the buffer is that was allocated, so we can get
rid of the random PAGE_SIZEs all over the place.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
We define our own BLK_STS_REMOVED, so we need our own to_str helper too.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
We ideally want the buckets used for the extra initial replicas to be
reused right away.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
This solves internal fragmentation within stripes. We already have
copygc, which evacuates buckets that are partially or mostly empty, but
it's up to the ec code that manages stripes to deal with stripes that
have empty buckets in them.
This patch changes the path for creating new stripes to check if there's
existing stripes with empty buckets - and if so, update them with new
data buckets instead of creating new stripes.
TODO: improve the disk space accounting so that we can only use this
(more expensive path) when we have too much fragmentation in existing
stripes.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Prep work for the patch to update existing stripes with new data blocks.
This moves allocating new stripes into ec.c, and also sets up the data
structures so that we can handly only allocating some of the blocks in a
stripe.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
This is mainly to solve a lock ordering issue, and also simplifies the
code a bit.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Soon we'll be able to modify existing stripes - replacing empty blocks
with new blocks and new p/q blocks. This patch updates the trigger code
to handle pointers changing in an existing stripe; also, it
significantly improves how the stripes heap works, which means we can
get rid of the stripe creation/deletion lock.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Improved error messages are always a good thing
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
When updates to interior nodes started being journalled, that meant that
after an unclean shutdown, until journal replay is done we can't walk
the btree without overlaying the updates from the journal.
The initial btree gc was changed to walk the btree overlaying keys from
the journal - but bch2_alloc_read() and bch2_stripes_read() were missed.
Major whoops...
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
bch2_btree_iter_set_pos() invalidates the key returned by peek().
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
All iterators should be released now with bch2_trans_iter_put(), so
TRANS_RESET_ITERS shouldn't be needed anymore, and TRANS_RESET_MEM is
always used.
Also convert more code to __bch2_trans_do().
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Introduce a new iterator that iterates over keys in the btree with keys
from the journal overlaid on top. This factors out what the erasure
coding init code was doing manually.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
The trigger flags really belong with individual btree_insert_entries,
not the transaction commit flags - this splits out those flags and
unifies them with the BCH_BUCKET_MARK flags. Todo - split out
btree_trigger.c from buckets.c
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
BTREE_INSERT_ATOMIC should really be the default mode, and there's not
that much code that doesn't need it - so this is prep work for getting
rid of the flag.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
This implements code for storing small bkeys on the stack and allocating
out of a mempool if they're too big.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
The bkey_s_c returned by btree_iter_(peek|next) points into the btree
iter type, so advancing the iterator and then using the one previously
returned is a bug...
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
This make the disk accounting code saner, and it's not clear why we'd
ever want the same data to be in multiple stripes simultaneously.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
With reflink, various code now has to handle both KEY_TYPE_extent
or KEY_TYPE_reflink_v - so, convert it to be generic across all keys
with pointers.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Change it to not mark keys that will be overwritten by keys in the
journal - this fixes a bug where we pop an assertion in
bucket_set_stripe() because of a stale pointer - because the stripe that
has the stale pointer has been deleted.
This code could be factored out and used elsewhere, at some point.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
We weren't checking for errors when trying to delet stripes, which meant
ec_stripe_delete_work() would spin trying to delete the same stripe over
and over.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Switch to always using bio_add_page(), which merges contiguous pages now
that we have multipage bvecs.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>