IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
check that if an inode has a backpointer, the dirent it points to points
back to it.
We do this in check_dirent_inode_dirent(), but only for inodes that have
dirents that point to them - we also have to do the check starting from
the inode to catch inodes that don't have dirents that point to them.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Subvolumes and subvolume root inodes point to each other: this verifies
the subvolume -> inode -> subvolme path.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
This converts -EIOs related to btree node errors to private error codes,
which will help with some ongoing debugging by giving us better error
messages.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Add a flush op, to return the exit code via close().
Also update bcachefs usage to use this to return fsck exit codes.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Make it so that a thread_with_stdio user can handle ioctls against the
file descriptor.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Create an ops structure so we can add more file-based functionality in
the next few patches.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Experimentally fix some problems with stdio_redirect_vprintf by creating
a MOO variant with which we can experiment. We can't do a GFP_KERNEL
allocation while holding the spinlock, and I don't like how the printf
function can silently truncate the output if memory allocation fails.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Create a new run_thread_with_stdout function that opens a file in
O_RDONLY mode so that the kernel can write things to userspace but
userspace cannot write to the kernel. This will be used to convey xfs
health event information to userspace.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
This fixes a bug where we'd return data without waiting for a newline,
if data was present but a newline was not.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Move the cleanup code to a wrapper function, where we can call it after
the thread_with_stdio fn exits.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
- eliminate the dependency on printbufs, so that we can lift
thread_with_file for use in xfs
- add a nonblocking parameter to stdio_redirect_printf(), and either
block if the buffer is full or drop it on the floor - don't buffer
infinitely
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
The output buffer lock has to be a spinlock so that we can write to it
from interrupt context, so we can't use a direct copy_to_user; this
switches thread_with_file_read() to use fault_in_writeable() and
copy_to_user_nofault(), similar to how thread_with_file_write() works.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZem4DwAKCRCRxhvAZXjc
ooTRAQDRI6Qz6wJym5Yblta8BScMGbt/SgrdgkoCvT6y83MtqwD+Nv/AZQzi3A3l
9NdULtniW1reuCYkc8R7dYM8S+yAwAc=
=Y1qX
-----END PGP SIGNATURE-----
Merge tag 'vfs-6.9.super' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull block handle updates from Christian Brauner:
"Last cycle we changed opening of block devices, and opening a block
device would return a bdev_handle. This allowed us to implement
support for restricting and forbidding writes to mounted block
devices. It was accompanied by converting and adding helpers to
operate on bdev_handles instead of plain block devices.
That was already a good step forward but ultimately it isn't necessary
to have special purpose helpers for opening block devices internally
that return a bdev_handle.
Fundamentally, opening a block device internally should just be
equivalent to opening files. So now all internal opens of block
devices return files just as a userspace open would. Instead of
introducing a separate indirection into bdev_open_by_*() via struct
bdev_handle bdev_file_open_by_*() is made to just return a struct
file. Opening and closing a block device just becomes equivalent to
opening and closing a file.
This all works well because internally we already have a pseudo fs for
block devices and so opening block devices is simple. There's a few
places where we needed to be careful such as during boot when the
kernel is supposed to mount the rootfs directly without init doing it.
Here we need to take care to ensure that we flush out any asynchronous
file close. That's what we already do for opening, unpacking, and
closing the initramfs. So nothing new here.
The equivalence of opening and closing block devices to regular files
is a win in and of itself. But it also has various other advantages.
We can remove struct bdev_handle completely. Various low-level helpers
are now private to the block layer. Other helpers were simply
removable completely.
A follow-up series that is already reviewed build on this and makes it
possible to remove bdev->bd_inode and allows various clean ups of the
buffer head code as well. All places where we stashed a bdev_handle
now just stash a file and use simple accessors to get to the actual
block device which was already the case for bdev_handle"
* tag 'vfs-6.9.super' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (35 commits)
block: remove bdev_handle completely
block: don't rely on BLK_OPEN_RESTRICT_WRITES when yielding write access
bdev: remove bdev pointer from struct bdev_handle
bdev: make struct bdev_handle private to the block layer
bdev: make bdev_{release, open_by_dev}() private to block layer
bdev: remove bdev_open_by_path()
reiserfs: port block device access to file
ocfs2: port block device access to file
nfs: port block device access to files
jfs: port block device access to file
f2fs: port block device access to files
ext4: port block device access to file
erofs: port device access to file
btrfs: port device access to file
bcachefs: port block device access to file
target: port block device access to file
s390: port block device access to file
nvme: port block device access to file
block2mtd: port device access to files
bcache: port block device access to files
...
When a dirent points to a missing inode, we really should print out the
dirent.
This requires quite a bit of refactoring, but there's some other
benefits: we now do the entire looup (dirent and inode) in a single
btree transaction, and copy to the VFS inode with btree locks still
held, like the create path.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Repurposing standard error codes in bcachefs code is banned in new code,
and we need to get rid of the remaining ones - private error codes give
us much better error messages.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
WQ_UNBOUND with max_active 1 means ordered workqueue, but we don't
actually need or want ordered semantics - and probably want a higher
concurrency limit anyways.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
switch the statfs code from something horrible and open coded to the
more standard uuid_to_fsid()
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Files within a subvolume cannot be renamed into another subvolume, but
subvolumes themselves were intended to be.
This implements subvolume renaming - we need to ensure that there's only
a single dirent that points to a subvolume key (not multiple versions in
different snapshots), and we need to ensure that dirent.d_parent_subol
and inode.bi_parent_subvol are updated.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
btree_and_journal_iter is old code that we want to get rid of, but we're
not ready to yet.
lack of btree node prefetching is, it turns out, a real performance
issue for fsck on spinning rust, so - add it.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
we now always have a btree_trans when using a btree_and_journal_iter;
prep work for adding prefetching to btree_and_journal_iter
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Recently a severe performance regression was discovered, which bisected
to
a6548c8b5eb5 bcachefs: Avoid flushing the journal in the discard path
It turns out the old behaviour, which issued excessive journal flushes,
worked around a performance issue where queueing delays would cause the
journal to not be able to write quickly enough and stall.
The journal flushes masked the issue because they periodically flushed
the device write cache, reducing write latency for non flushes.
This patch reworks the journalling code to allow more than one
(non-flush) write to be in flight at a time. With this patch, doing 4k
random writes and an iodepth of 128, we are now able to hit 560k iops to
a Samsung 970 EVO Plus - previously, we were stuck in the ~200k range.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
This gives us a way to record the date and time every journal entry was
written - useful for debugging.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Drop an unnecessary bch2_subvolume_get_snapshot() call, and drop the __
from the name - this is a normal interface.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Most bcachefs workqueues are used for completions, and should be
WQ_HIGHPRI - this helps reduce queuing delays, we want to complete
quickly once we can no longer signal backpressure by blocking.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Previously, any time we failed to get a journal reservation we'd retry,
with the journal lock held; but this isn't necessary given
wait_event()/wake_up() ordering.
This avoids performance cliffs when the journal starts to get backed up
and lock contention shoots up.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
We don't want journal write completions to be blocked behind btree
transactions - io_complete_wq is used for btree updates after data and
metadata writes.
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
When we are checking whether a subvolume is empty in the specified snapshot,
entries that do not belong to this subvolume should be skipped.
This fixes the following case:
$ bcachefs subvolume create ./sub
$ cd sub
$ bcachefs subvolume create ./sub2
$ bcachefs subvolume snapshot . ./snap
$ ls -a snap
. ..
$ rmdir snap
rmdir: failed to remove 'snap': Directory not empty
As Kent suggested, we pass 0 in may_delete_deleted_inode() to ignore subvols
in the subvol we are checking, because inode.bi_subvol is only set on
subvolume roots, and we can't go through every inode in the subvolume and
change bi_subvol when taking a snapshot. It makes the check less strict, but
that's ok, the rest of fsck will still catch it.
Signed-off-by: Guoyu Ou <benogy@gmail.com>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>