IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
vstruct_bytes() was returning a u64 - it should be a size_t, the corect
type for the size of anything that fits in memory.
Also replace a 64 bit divide with div_u64().
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
This patch was originally to work around the journal geting stuck in
nochanges mode - but that was just a hack, we needed to fix the actual
bug. It should be fixed now, so revert it.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Long ago it was possible to get a journal reservation and not use it,
but that's no longer allowed, which means journal_write_compact() has
very little work to do, and isn't really worth the code anymore.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
This patch improves the superblock .to_text() methods and adds methods
for all types that were missing them. It also improves printbufs by
allowing them to specfiy what units we want to be printing in, and adds
new wrapper methods for unifying our kernel and userspace environments.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
bch_scnmemcpy was for printing length-limited strings that might not
have a terminating null - turns out sprintf & pr_buf can do this with
%.*s.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
When viewing what's in the journal, it's more useful to have the logical
location - journal bucket and offset within that bucket - than just the
offset on that device.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Apparently it actually is possible for crypto_skcipher_encrypt() to
return an error - not sure why that would be - but we need to replace
our assertion with actual error handling.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
This improves the formatting of journal_entry_btree_keys_to_text() by
putting each key on its own line.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Change the error messages in bch2_inconsistent_error() and
bch2_fatal_error() so we can distinguish them.
Also, prefer bch2_fs_fatal_error() (which also logs an error message) to
bch2_fatal_error(), and change a call to bch2_inconsistent_error() to
bch2_fatal_error() when we can't continue.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Implement a hash table, using cuckoo hashing, for empty buckets that are
waiting on a journal commit before they can be reused.
This replaces the journal_seq field of bucket_mark, and is part of
eventually getting rid of the in memory bucket array.
We may need to make bch2_bucket_needs_journal_commit() lockless, pending
profiling and testing.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
This adds a _to_text() pretty printer for journal entries - including
every subtype - which will shortly be used by the 'bcachefs
list_journal' subcommand.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Add a journal entry type for logging messages, and add an option to use
it to log the transaction name - this makes for a very handy debugging
tool, as with it we can use the 'bcachefs list_journal' command to see
not only what updates were done, but what was doing them.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Add bch2_journal_noflush_seq(), for telling the journal that entries
before a given sequence number should not be flushes - to be used by an
upcoming allocator optimization.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
This patch ensures that the journal entry written gets written as flush
entry, which is important for the shutdown path - the last entry written
needs to be a flush entry.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
This adds flags for options that must be a power of two (block size and
btree node size), and options that are stored in the superblock as a
power of two (encoded extent max).
Also: options are now stored in memory in the same units they're
displayed in (bytes): we now convert when getting and setting from the
superblock.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
This adds more latency/event measurements and breaks some apart into
more events. Journal writes are broken apart into flush writes and
noflush writes, btree compactions are broken out from btree splits,
btree mergers are added, as well as btree_interior_updates - foreground
and total.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
This converts journal_write_delay, journal_flush_disabled, and
journal_reclaim_delay to normal filesystems options, and also adds them
to the superblock.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
- bch2_journal_halt() was unconditionally overwriting j->err_seq, the
sequence number that we failed to write
- journal_write_done was updating seq_ondisk and flushed_seq_ondisk even
for writes that errored, which broke the way bch2_journal_flush_seq_async()
locklessly checked for completions.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
This tweaks the journal code to always act as if there's space available
in nochanges mode, when we're not going to be doing any writes. This
helps in recovering filesystems that won't mount because they need
journal replay and the journal has gotten stuck.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
If the last entry(ies) would be all zeros, there's no need to write them
out - the read path already handles that.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Journal write errors were racing with the submission path - potentially
causing writes to other replicas to not get submitted.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Also, clean up workqueue usage - we shouldn't be using system
workqueues, pretty much everything we do needs to be on our own
WQ_MEM_RECLAIM workqueues.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
jset->last_seq is in the region that's encrypted - on journal write
completion, we were using it and getting garbage. This patch shadows it
to fix.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
This patch starts treating the bpos.snapshot field like part of the key
in the btree code:
* bpos_successor() and bpos_predecessor() now include the snapshot field
* Keys in btrees that will be using snapshots (extents, inodes, dirents
and xattrs) now always have their snapshot field set to U32_MAX
The btree iterator code gets a new flag, BTREE_ITER_ALL_SNAPSHOTS, that
determines whether we're iterating over keys in all snapshots or not -
internally, this controlls whether bkey_(successor|predecessor)
increment/decrement the snapshot field, or only the higher bits of the
key.
We add a new member to struct btree_iter, iter->snapshot: when
BTREE_ITER_ALL_SNAPSHOTS is not set, iter->pos.snapshot should always
equal iter->snapshot, which will be 0 for btrees that don't use
snapshots, and alsways U32_MAX for btrees that will use snapshots
(until we enable snapshot creation).
This patch also introduces a new metadata version number, and compat
code for reading from/writing to older versions - this isn't a forced
upgrade (yet).
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
The bkey compat code wasn't being run for btree roots in the superblock
clean section - this patch fixes it to use the journal entry validate
code.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
This patch standardizes all the enums that have associated string tables
(probably more enums should have string tables).
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
We had a bug reported where the journal is failing to allocate a journal
write - this should help figure out what's going on.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
This eliminates the need to scan every bucket to regenerate dev_usage at
mount time.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Originally, bcachefs - going back to bcache - stored, for each bucket, a
16 bit counter corresponding to how long it had been since the bucket
was read from. But, this required periodically rescaling counters on
every bucket to avoid wraparound. That wasn't an issue in bcache, where
we'd perodically rewrite the per bucket metadata all at once, but in
bcachefs we're trying to avoid having to walk every single bucket.
This patch switches to persisting 64 bit io clocks, corresponding to the
64 bit bucket timestaps introduced in the previous patch with
KEY_TYPE_alloc_v2.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
This fixes a bug introduced by "bcachefs: Improve diagnostics when
journal entries are missing" - devices in a replicas entry are supposed
to be sorted.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Also, make journal writes obey foreground_target and metadata_target.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
There's an outstanding bug with journal entries being missing in journal
replay. This patch adds code to print out where the journal entries were
physically located that were around the entry(ies) being missing, which
should make debugging easier.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
All writes prior to a journal write need to be flushed before the
journal write itself happens. On single device filesystems, it suffices
to mark the write with REQ_PREFLUSH|REQ_FUA, but on multi device
filesystems we need to issue flushes to every device - and wait for them
to complete - before issuing the journal writes. Previously, we were
issuing flushes to every device, but we weren't waiting for them to
complete before issuing the journal writes.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
This is because we had a bug where we were writing out journal entries
with garbage last_seq, and not catching it.
Also, completely ignore jset->last_seq when JSET_NO_FLUSH is true,
because of aforementioned bug, but change the write path to set last_seq
to 0 when JSET_NO_FLUSH is true.
Minor other cleanups and comments.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
We don't want to fail the recovery/mount because of a single error
reading from the journal - the relevant journal entry may still be found
on other devices, and missing or no journal entries found is already
handled later in the recovery process.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
It used to be safe to reallocate a buf that the write path owns without
holding the journal lock, but now this can trigger an assertion in
journal_seq_to_buf().
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
With various newer key types - stripe keys, inline data extents - the
old approach of calculating the maximum size of the value is becoming
more and more error prone. Better to switch to bkey_on_stack, which can
dynamically allocate if necessary to handle any size bkey.
In particular we also want to get rid of BKEY_EXTENT_VAL_U64s_MAX.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
- Try to always keep 1/8th of the journal free, on top of
pre-reservations
- Move the check for whether the journal is stuck to
bch2_journal_space_available, and make it only fire when there aren't
any journal writes in flight (that might free up space by updating
last_seq)
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
This patch adds a flag to journal entries which, if set, indicates that
they weren't done as flush/fua writes.
- non flush/fua journal writes don't update last_seq (i.e. they don't
free up space in the journal), thus the journal free space
calculations now check whether nonflush journal writes are currently
allowed (i.e. are we low on free space, or would doing a flush write
free up a lot of space in the journal)
- write_delay_ms, the user configurable option for when open journal
entries are automatically written, is now interpreted as the max
delay between flush journal writes (default 1 second).
- bch2_journal_flush_seq_async is changed to ensure a flush write >=
the requested sequence number has happened
- journal read/replay must now ignore, and blacklist, any journal
entries newer than the most recent flush entry in the journal. Also,
the way the read_entire_journal option is handled has been improved;
struct journal_replay now has an entry, 'ignore', for entries that
were read but should not be used.
- assorted refactoring and improvements related to journal read in
journal_io.c and recovery.c
Previously, we'd have to issue a flush/fua write every time we
accumulated a full journal entry - typically the bucket size. Now we
need to issue them much less frequently: when an fsync is requested, or
it's been more than write_delay_ms since the last flush, or when we need
to free up space in the journal. This is a significant performance
improvement on many write heavy workloads.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
This patch increases the maximum journal buffers in flight from 2 to 4 -
this will be particularly helpful when in the future we stop requiring
flush+fua for every journal write.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
it's useful to know whether an error was for a read or a write - this
also standardizes error messages a bit more.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
When we detect bad keys in the journal that have to be dropped, the flow
control was wrong - we ended up not checking the next key in that entry.
Oops.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Improved the way we track various state by adding j->err_seq, which
records the first journal sequence number that encountered an error
being written, and j->last_empty_seq, which records the most recent
journal entry that was completely empty.
Also, use the low bits of the journal sequence number to index the
corresponding journal_buf.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>