IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Since outstanding journal buffers hold a journal pin, when flushing all
pins we need to close the current journal entry if necessary so its pin
can be released.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Validation was completely missing for replicas entries in the journal
(not the superblock replicas section) - we can't have replicas entries
pointing to invalid devices.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Control flow integrity is now checking that type signatures match on
indirect function calls. That breaks closures, which embed a work_struct
in a closure in such a way that a closure_fn may also be used as a
workqueue fn by the underlying closure code.
So we have to change closure fns to take a work_struct as their
argument - but that results in a loss of clarity, as closure fns have
different semantics from normal workqueue functions (they run owning a
ref on the closure, which must be released with continue_at() or
closure_return()).
Thus, this patc introduces CLOSURE_CALLBACK() and closure_type() macros
as suggested by Kees, to smooth things over a bit.
Suggested-by: Kees Cook <keescook@chromium.org>
Cc: Coly Li <colyli@suse.de>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
The journal read path had some informational log statements preperatory
for ZNS support - they're not of interest to users, so we can turn them
off.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Split up bch2_journal_write() to simplify locking:
- bch2_journal_write_pick_flush(), which needs j->lock
- bch2_journal_write_prep, which operates on the journal buffer to be
written and will need the upcoming buf_lock for synchronization with
the btree write buffer flush path
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
This patch adds a superblock error counter for every distinct fsck
error; this means that when analyzing filesystems out in the wild we'll
be able to see what sorts of inconsistencies are being found and repair,
and hence what bugs to look for.
Errors validating bkeys are not yet considered distinct fsck errors, but
this patch adds a new helper, bkey_fsck_err(), in order to add distinct
error types for them as well.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
We now track IO errors per device since filesystem creation.
IO error counts can be viewed in sysfs, or with the 'bcachefs
show-super' command.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Since we can run with unknown btree IDs, we can't directly index btree
IDs into fixed size arrays.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
bch2_journal_write() expects process context, it takes journal_lock as
needed.
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
More reorganization, this splits up io.c into
- io_read.c
- io_misc.c - fallocate, fpunch, truncate
- io_write.c
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
This fixes the replicas_write_errors test: the patch
bcachefs: mark journal replicas before journal write submission
partially fixed replicas marking for the journal, but it broke the case
where one replica failed - this patch re-adds marking after the journal
write completes, when we know how many replicas succeeded.
Additionally, we do not consider it a fsck error when the very last
journal entry is not correctly marked, since there is an inherent race
there.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
This fixes a bug where we were already passing bkey_invalid_flags
around, but treating the parameter as just read/write - so the compat
code wasn't being run correctly.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
This introduces major/minor versioning to the superblock version number.
Major version number changes indicate incompatible releases; we can move
forward to a new major version number, but not backwards. Minor version
numbers indicate compatible changes - these add features, but can still
be mounted and used by old versions.
With the recent patches that make it possible to roll out new btrees and
key types without breaking compatibility, we should be able to roll out
most new features without incompatible changes.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
As part of the forward compatibility patch series, we need to allow for
new key types without complaining loudly when running an old version.
This patch changes the flags parameter of bkey_invalid to an enum, and
adds a new flag to indicate we're being called from the transaction
commit path.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
This adds a new helper for checking if an on-disk version is compatible
with the running version of bcachefs - prep work for introducing
major:minor version numbers.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
This unifies JOURNAL_WATERMARK with BCH_WATERMARK; we're working towards
specifying watermarks once in the transaction commit path.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
GFP_NOIO dates from the bcache days, when we operated under the block
layer. Now, GFP_NOFS is more appropriate, so switch all GFP_NOIO uses to
GFP_NOFS.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
A user hit this BUG_ON() - it's unclear how it happened, so replace it
with a fatal error that will cause us to go read only, and print out
more information.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
The journal write submission path marks the associated replica
entries for journal data in journal_write_done(), which is just
after journal write bio submission. This creates a small window
where journal entries might have been written out, but the
associated replica is not marked such that recovery does not know
that the associated device contains journal data.
Move the replica marking a bit earlier in the write path such that
recovery is guaranteed to recognize that the device contains journal
data in the event of a crash.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
This adds private error codes for most (but not all) of our ENOMEM uses,
which makes it easier to track down assorted allocation failures.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
- __bch2_bkey_drop_ptr() -> bch2_bkey_drop_ptr_noerror(), now available
outside extents.
- Split bch2_bkey_has_device() and bch2_bkey_has_device_c(), const and
non const versions
- bch2_extent_has_ptr() now returns the pointer it found
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Rust bindgen doesn't cope well with anonymous structs and unions. This
patch drops the fancy anonymous structs & unions in bkey_i that let us
use the same helpers for bkey_i and bkey_packed; since bkey_packed is an
internal type that's never exposed to outside code, it's only a minor
inconvenienc.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
This brings back journal_entries_compact(), but in a more efficient form
- we need to do multiple postprocess steps, so iterate over the
journal entries being written just once to make it more efficient.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
This patch
- Adds a mechanism for queuing up journal entries prior to the journal
being started, which will be used for early journal log messages
- Adds bch2_fs_log_msg() and improves bch2_trans_log_msg(), which now
take format strings. bch2_fs_log_msg() can be used before or after
the journal has been started, and will use the appropriate mechanism.
- Deletes the now obsolete bch2_journal_log_msg()
- And adds more log messages to the recovery path - messages for
journal/filesystem started, journal entries being blacklisted, and
journal replay starting/finishing.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
If it so happens that we crash while dirty, meaning we don't have the
superblock clean section, and we erroneously mark a journal entry we
wrote as blacklisted, we won't be able to recover.
This patch fixes this by adding a fallback: if we've got no superblock
clean section, and no non-ignored journal entries, we try the most
recent ignored journal entry.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
This just cleans up and simplifies the code that decides where to resume
writing in the journal - when the code was originally written we weren't
saving the precise location of every journal write found.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
On startup, we need to ensure the first journal entry written is a flush
write: after a clean shutdown we generally don't read the journal, which
means we might be overwriting whatever was there previously, and there
must always be at least one flush entry in the journal or recovery will
fail.
Found by fstests generic/388.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
This tweaks the recovery and journal paths so that we don't error out
before we need to: the list_journal command should work, even if we
wouldn't be able to replay successfully.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
If the last journal write didn't complete sucessfully due to a torn
write, we'll detect it as a checksum error. In that case, we should just
pretend that journal entry was never written.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Print out the journal entries we read and will replay as soon as
possible - if we get an error walidating keys it's helpful to know where
it was in the journal.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
- Centralize format strings in bcachefs.h
- Add bch2_fmt_inum_offset() and related helpers
- Switch error messages for inodes to also print out the offset, in
bytes
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
On journal read, previously we would do full journal entry validation
immediately after reading a journal entry.
However, this would lead to errors for journal entries we weren't
actually going to use, either because they were too old or too new
(newer than the most recent flush).
We've observed write tearing on journal entries newer than the newest
flush - which makes sense, prior to a flush there's no guarantees about
write persistence.
This patch defers full journal entry validation until the end of the
journal read path, when we know which journal entries we'll want to use.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Prep work for the next patch, to defer journal entry validation: we now
track for each replica whether we had a good checksum.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Previously, jset_validate() was formatting the initial part of an error
string for every entry it validating - expensive.
This moves that code to journal_entry_err_msg(), which is now only
called if there's an actual error.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Also, do some reorganizing/renaming, convert atomic counters in bch_fs
to persistent counters, and add a few missing counters.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
This converts bcachefs to the modern printbuf interface/implementation,
synced with the version to be submitted upstream.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Lately we've been doing a lot of debugging by looking at the journal to
see what was changed, and by what code path. This patch adds a new
journal entry type for recording overwrites, so that we don't have to
search backwards through the journal to see what was being overwritten
in order to work out what the triggers were supposed to be doing.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
This option was useful when the replicas mechism was new and still being
debugged, but hasn't been used in ages - let's delete it.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
In journal_entry_add(), we were repeatedly scanning the journal entries
radix tree to scan for old entries that can be freed, with O(n^2)
behaviour. This patch tweaks things to remember the previous last_seq,
so we don't have to scan for entries to free from the start.
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>