IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
If we're looking for a bcachefs supers iteratively we don't want to see
this error.
This function replaces KERN_ERR with KERN_INFO for when we don't find a
bcachefs superblock but preserves other errors.
Signed-off-by: Daniel Hill <daniel@gluo.nz>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
bch2_update_cached_sectors_list() is closer to how the new disk space
accounting works, called from trans_mark().
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
For BTREE_ITER_WITH_JOURNAL, we memoize lookups in the journal keys, to
avoid the binary search overhead.
Previously we stashed the pos of the last key returned from the journal,
in order to force the lookup to be redone when rewinding.
Now bch2_journal_keys_peek_upto() handles rewinding itself when
necessary - so we can slim down btree_iter.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
The flush_all_pins() after journal replay was unecessary, and trying to
completely flush the journal while RW is not a great idea - it's not
guaranteed to terminate if other threads keep adding things to the
jorunal.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
As discussed in the previous patch, BTREE_ITER_ALL_LEVELS appears to be
racy with concurrent interior node updates - and perhaps it is fixable,
but it's tricky and unnecessary.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
It appears that BTREE_ITER_ALL_LEVELS is racy with concurrent interior
node btree updates; unfortunate but not terribly surprising it's a
difficult problem - that was the original reason for gc_lock.
BTREE_ITER_ALL_LEVELS will probably be deleted in a subsequent patch,
this changes backpointers fsck to instead walk keys at one level of the
btree at a time.
This fixes the tiering_drop_alloc test, which stopped working with the
patch to not flush the journal after journal replay.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Create a separate enum for str_hash flags - instead of abusing the
btree_insert_flags enum - and create a __bitwise typedef for sparse
typechecking.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
path->level was being read, but never used.
Reported-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Journal replay now first attempts to replay keys in sorted order,
similar to how the btree write buffer flush path works.
Any keys that can not be replayed due to journal deadlock are then left
for later and replayed in journal order, unpinning journal entries as we
go.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
This gets us slightly nicer log messages.
Also, this slightly clarifies synchronization of c->journal_keys; after
we go RW it's in use by multiple threads (so that the btree iterator
code can overlay keys from the journal); so it has to be prepped before
that point.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
With the previous patch that reworks BTREE_INSERT_JOURNAL_REPLAY, we can
now switch the btree write buffer to use it for flushing.
This has the advantage that transaction commits don't need to take a
journal reservation at all.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
This slightly changes how trans->journal_res works, in preparation for
changing the btree write buffer flush path to use it.
Now, BTREE_INSERT_JOURNAL_REPLAY means "don't take a journal
reservation; trans->journal_res.seq already refers to the journal
sequence number to pin".
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
The upcoming btree write buffer rework is going to use the journal
itself as the first stage of the write buffer; this is a cleanup to make
sure k->needs_whiteout is initialized before keys hit the journal.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
This introduces a new helper for connecting time_stats to state changes,
i.e. when taking journal reservations is blocked for some reason.
We use this to track separately the different reasons the journal might
be blocked - i.e. space in the journal full, or the journal pin fifo
full.
Also do some cleanup and improvements on the time stats code.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Previously, bch2_journal_pin_set() would silently ignore a request to
pin a journal sequence number that was no longer dirty, because it was
used internally by bch2_journal_pin_copy() which could race with the src
pin being flushed.
Split these apart so that we can properly assert that @seq is a
currently dirty journal sequence number - this is almost always a bug.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
In an ideal world, we'd have a common helper that could be used for
sorting a list of inodes into the correct lock order, and then the same
lock ordering could be used for any type of inode lock, not just
i_rwsem.
But the lock ordering rules for i_rwsem are a bit complicated, so -
abandon that dream for now and do it the more standard way.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Also log time waiting for c->writes references to be dropped; this will
help in debugging why unmounts are taking longer than they should.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
It's confusing if we run fsck a second time (in debug mode, to verify
the second run is clean), but errors are still ratelimited from the
first run.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Add checks to all the VFS paths for "are we in a RO snapshot?".
Note - we don't check this when setting inode options via our xattr
interface, since those generally only affect data placement, not
contents of data.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Reported-by: "Carl E. Thompson" <list-bcachefs@carlthompson.net>
Add a new superblock section that contains a list of
{ minor version, recovery passes, errors_to_fix }
that is - a list of recovery passes that must be run when downgrading
past a given version, and a list of errors to silently fix.
The upcoming disk accounting rewrite is not going to be fully
compatible: we're going to have to regenerate accounting both when
upgrading to the new version, and also from downgrading from the new
version, since the new method of doing disk space accounting is a
completely different architecture based on deltas, and synchronizing
them for every jounal entry write to maintain compatibility is going to
be too expensive and impractical.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Add two new superblock fields. Since the main section of the superblock
is now fully, we have to add a new variable length section for them -
bch_sb_field_ext.
- recovery_passes_requried: recovery passes that must be run on the
next mount
- errors_silent: errors that will be silently fixed
These are to improve upgrading and dwongrading: these fields won't be
cleared until after recovery successfully completes, so there won't be
any issues with crashing partway through an upgrade or a downgrade.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
The next patch will start to refer to recovery passes from the
superblock; naturally, we now need identifiers that don't change, since
the existing enum is in the order in which they are run and is not
fixed.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
BCH_REPLICAS_MAX isn't the actual maximum number of pointers in an
extent, it's the maximum number of dirty pointers.
We don't have a real restriction on the number of cached pointers, and
we don't want a fixed size array here anyways - so switch to
DARRAY_PREALLOCATED().
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Reported-and-tested-by: Daniel J Blueman <daniel@quora.org>
We sometimes use darrays for quite large buffers - the btree write
buffer in particular needs large buffers, since it must be sized to hold
all the write buffer keys outstanding in the journal.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Move the slowpath (actually growing the darray) to an out-of-line
function; also, add some helpers for the upcoming btree write buffer
rewrite.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
If a superblock write hasn't happened (i.e. we never had to go rw), then
c->sb.version will be out of date w.r.t. c->disk_sb.sb->version.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
turns out iterate_iovec() mutates __iov, we need to save our own copy
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Reported-by: Marcin Mirosław <marcin@mejor.pl>
peek_upto() checks against the end position and bails out before
FILTER_SNAPSHOTS checks; this is because if we end up at a different
inode number than the original search key none of the keys we see might
be visibile in the current snapshot - we might be looking at inode in a
completely different subvolume.
But this is broken, because when we're iterating over extents we're
checking against the extent start position to decide when to bail out,
and the extent start position isn't monotonically increasing until after
we've run FILTER_SNAPSHOTS.
Fix this by adding a simple inode number check where the old bailout
check was, and moving the main check to the correct position.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Reported-by: "Carl E. Thompson" <list-bcachefs@carlthompson.net>
- Fix readers that are blocked on the ring buffer when buffer_percent is
100%. They are supposed to wake up when the buffer is full, but
because the sub-buffer that the writer is on is never considered
"dirty" in the calculation, dirty pages will never equal nr_pages.
Add +1 to the dirty count in order to count for the sub-buffer that
the writer is on.
- When a reader is blocked on the "snapshot_raw" file, it is to be
woken up when a snapshot is done and be able to read the snapshot
buffer. But because the snapshot swaps the buffers (the main one
with the snapshot one), and the snapshot reader is waiting on the
old snapshot buffer, it was not woken up (because it is now on
the main buffer after the swap). Worse yet, when it reads the buffer
after a snapshot, it's not reading the snapshot buffer, it's reading
the live active main buffer.
Fix this by forcing a wakeup of all readers on the snapshot buffer when
a new snapshot happens, and then update the buffer that the reader
is reading to be back on the snapshot buffer.
- Fix the modification of the direct_function hash. There was a race
when new functions were added to the direct_function hash as when
it moved function entries from the old hash to the new one, a direct
function trace could be hit and not see its entry.
This is fixed by allocating the new hash, copy all the old entries
onto it as well as the new entries, and then use rcu_assign_pointer()
to update the new direct_function hash with it.
This also fixes a memory leak in that code.
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCZZAzTxQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6qs9IAP9e6wZ74aEjMED9nsbC49EpyCNTqa72
y0uDS/p9ppv52gD7Be+l+kJQzYNh6bZU0+B19hNC2QVn38jb7sOadfO/1Q8=
=NDkf
-----END PGP SIGNATURE-----
Merge tag 'trace-v6.7-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing fixes from Steven Rostedt:
- Fix readers that are blocked on the ring buffer when buffer_percent
is 100%. They are supposed to wake up when the buffer is full, but
because the sub-buffer that the writer is on is never considered
"dirty" in the calculation, dirty pages will never equal nr_pages.
Add +1 to the dirty count in order to count for the sub-buffer that
the writer is on.
- When a reader is blocked on the "snapshot_raw" file, it is to be
woken up when a snapshot is done and be able to read the snapshot
buffer. But because the snapshot swaps the buffers (the main one with
the snapshot one), and the snapshot reader is waiting on the old
snapshot buffer, it was not woken up (because it is now on the main
buffer after the swap). Worse yet, when it reads the buffer after a
snapshot, it's not reading the snapshot buffer, it's reading the live
active main buffer.
Fix this by forcing a wakeup of all readers on the snapshot buffer
when a new snapshot happens, and then update the buffer that the
reader is reading to be back on the snapshot buffer.
- Fix the modification of the direct_function hash. There was a race
when new functions were added to the direct_function hash as when it
moved function entries from the old hash to the new one, a direct
function trace could be hit and not see its entry.
This is fixed by allocating the new hash, copy all the old entries
onto it as well as the new entries, and then use rcu_assign_pointer()
to update the new direct_function hash with it.
This also fixes a memory leak in that code.
- Fix eventfs ownership
* tag 'trace-v6.7-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
ftrace: Fix modification of direct_function hash while in use
tracing: Fix blocked reader of snapshot buffer
ring-buffer: Fix wake ups when buffer_percent is set to 100
eventfs: Fix file and directory uid and gid ownership