linux

iv/linux

Author	SHA1	Message	Date
Kent Overstreet	90aa35c4c9	bcachefs: Add journal.blocked to journal_debug_to_text() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-13 21:22:25 -04:00
Kent Overstreet	d9290c9931	bcachefs: Fix journal_buf bitfield accesses All jounal_buf bitfield updates must happen under the journal lock - perhaps we should just switch these to atomic bit flags. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-13 21:22:25 -04:00
Kent Overstreet	a393f33123	bcachefs: Split out discard fastpath Buckets usually can't be discarded until the transaction that made them empty has been committed in the journal. Tracing has indicated that we're queuing the discard worker excessively, only for it to skip over many buckets that are still waiting on a journal commit, discarding only one or two buckets per iteration. We want to switch to only queuing the discard worker after a journal flush write, but there's an important optimization we need to preserve: if a bucket becomes empty and it was never committed in the journal while it was in use, we want to discard it and reuse it right away - since overwriting it before the previous writes are flushed from the device cache eans those writes only cost bus bandwidth. So, this patch implements a fast path for buckets that can be discarded right away. We need new locking between the two discard workers; the new list of buckets being discarded provides that locking. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-13 21:22:25 -04:00
Kent Overstreet	06d493fee4	bcachefs: improve bch2_journal_buf_to_text() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-13 21:22:25 -04:00
Kent Overstreet	29e11f9699	bcachefs: Drop redundant btree_path_downgrade()s If a path doesn't have any active references, we shouldn't downgrade it; it'll either be reused, possibly with intent refs again, or dropped at bch2_trans_begin() time. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-13 21:22:25 -04:00
Daniel Hill	ba78af9e56	bcachefs: rebalance_status now shows correct units Signed-off-by: Daniel Hill <daniel@gluo.nz> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-13 21:22:25 -04:00
Kent Overstreet	3235e04afe	bcachefs: more informative write path error message Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-13 21:22:25 -04:00
Kent Overstreet	74406f66ad	bcachefs: check_path() now only needs to walk up to subvolume root Now that checking subvolume structure is a separate pass, the main check_directory_connectivity() pass only needs to walk up to a given inode's subvolume root. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-13 21:22:25 -04:00
Kent Overstreet	663db5a554	bcachefs: bch2_check_subvolume_structure() Now that we've got bch_subvolume.fs_path_parent, it's easy to write subvolume Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-13 21:22:25 -04:00
Thomas Bertschinger	b07ce72626	bcachefs: omit alignment attribute on big endian struct bkey This is needed for building Rust bindings on big endian architectures like s390x. Currently this is only done in userspace, but it might happen in-kernel in the future. When creating a Rust binding for struct bkey, the "packed" attribute is needed to get a type with the correct member offsets in the big endian case. However, rustc does not allow types to have both a "packed" and "align" attribute. Thus, in order to get a Rust type compatible with the C type, we must omit the "aligned" attribute in C. This does not affect the struct's size or member offsets, only its toplevel alignment, which should be an acceptable impact. The little endian version can have the "align" attribute because the "packed" attr is redundant, and rust-bindgen will omit the "packed" attr when an "align" attr is present and it can do so without changing a type's layout Signed-off-by: Thomas Bertschinger <tahbertschinger@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-13 21:22:25 -04:00
Kent Overstreet	6e9d0558b1	bcachefs: bch2_trigger_alloc() handles state changes better bch2_trigger_alloc() kicks off certain tasks on bucket state changes; e.g. triggering the bucket discard worker and the invalidate worker. We've observed the discard worker running too often - most runs it doesn't do any work, according to the tracepoint - so clearly, we're kicking it off too often. This adds an explicit statechange() macro to make these checks more precise. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-13 21:22:24 -04:00
Kent Overstreet	b63570f747	bcachefs: bch2_print_opts() Make sure early error messages get redirected, for kernel-fsck-from-userland. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-13 21:22:24 -04:00
Kent Overstreet	130d229ff5	bcachefs: Improve error messages in device remove path Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-13 21:22:24 -04:00
Kent Overstreet	5ca8ff157d	bcachefs: Use kvzalloc() when dynamically allocating btree paths THis silences a mm/page_alloc.c warning about allocating more than a page with GFP_NOFAIL - and there's no reason for this to not have a vmalloc fallback anyways. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-13 21:22:24 -04:00
Kent Overstreet	83bd5985fa	bcachefs: Track iter->ip_allocated at bch2_trans_copy_iter() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-13 21:22:24 -04:00
Kent Overstreet	3254c1b0e5	bcachefs: Save key_cache_path in peek_slot() When bch2_btree_iter_peek_slot() clones the iterator to search for the next key, and then discovers that the key from the cloned iterator is the key we want to return - we also want to save the iter->key_cache_path as well, for the update path. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-13 21:22:24 -04:00
Kent Overstreet	91dcad18d3	bcachefs: Pin btree cache in ram for random access in fsck Various phases of fsck involve checking references from one btree to another: this means doing a sequential scan of one btree, and then mostly random access into the second. This is particularly painful for checking extents <-> backpointers; we can prefetch btree node access on the sequential scan, but not on the random access portion, and this is particularly painful on spinning rust, where we'd like to keep the pipeline fairly full of btree node reads so that the elevator can reduce seeking. This patch implements prefetching and pinning of the portion of the btree that we'll be doing random access to. We already calculate how much of the random access btree will fit in memory so it's a fairly straightforward change. This will put more pressure on system memory usage, so we introduce a new option, fsck_memory_usage_percent, which is the percentage of total system ram that fsck is allowed to pin. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-13 21:22:24 -04:00
Kent Overstreet	835cd3e147	bcachefs: Check for subvolume children when deleting subvolumes Recursively destroying subvolumes isn't allowed yet. Fixes: https://github.com/koverstreet/bcachefs/issues/634 Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-13 21:22:24 -04:00
Kent Overstreet	b26d79147f	bcachefs: BTREE_ID_subvolume_children Add a btree to record a parent -> child subvolume relationships, according to the filesystem heirarchy. The subvolume_children btree is a bitset btree: if a bit is set at pos p, that means p.offset is a child of subvolume p.inode. This will be used for efficiently listing subvolumes, as well as recursive deletion. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-13 21:22:24 -04:00
Kent Overstreet	b8628a2529	bcachefs: bch_subvolume::fs_path_parent Record the filesystem path heirarchy for subvolumes in bch_subvolume Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-13 21:22:24 -04:00
Kent Overstreet	e07c28ab92	bcachefs: bch2_btree_bit_mod() Provide a non-write buffer version of bch2_btree_bit_mod_buffered(), for the subvolume children btree. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-13 21:22:24 -04:00
Kent Overstreet	506b187603	bcachefs: bch2_btree_bit_mod -> bch2_btree_bit_mod_buffered Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-13 21:22:24 -04:00
Kent Overstreet	56e230473d	bcachefs: Correctly reattach subvolumes Subvolumes need special handling to reattach - we always reattach them in the root subvolume's lost+found, and they need a slightly different kind of dirent. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-13 21:22:24 -04:00
Kent Overstreet	3a136177f3	bcachefs: check_path() now prints full inode when reattaching Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-13 21:22:24 -04:00
Kent Overstreet	688a769409	bcachefs: Pass inode bkey to check_path() prep work for improving logging/error messages Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-13 21:22:24 -04:00
Kent Overstreet	f5d58d0c72	bcachefs: Fix path where dirent -> subvol missing and we don't fix Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-13 21:22:24 -04:00
Kent Overstreet	64304aaf4e	bcachefs: bch_subvolume::parent -> creation_parent bit of renaming, prep for adding a fs path parent Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-13 21:22:24 -04:00
Kent Overstreet	45b4ed525e	bcachefs: Repair subvol dirents that point to non subvols when repair switches d_type to or from DT_SUBVOL, we need to update the target accordingly Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-13 21:22:24 -04:00
Kent Overstreet	c60b7f803c	bcachefs: check dirent->d_parent_subvol Check that d_parent_subvol makes sense - the dirent's snapshot must be visible in d_parent_subvol (i.e. an ancestor of d_parent_subvol's snapshot) in order to be visible. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-13 21:22:24 -04:00
Kent Overstreet	f4e68c859f	bcachefs: check inode->bi_parent_subvol against dirent Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-13 21:22:24 -04:00
Kent Overstreet	ea27001e14	bcachefs: delete duplicated checks in check_dirent_to_subvol() these were already checked in check_subvol() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-13 21:22:24 -04:00
Kent Overstreet	e539ebb867	bcachefs: simplify check_dirent_inode_dirent() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-13 21:22:24 -04:00
Kent Overstreet	0b498a5a39	bcachefs: check bi_parent_subvol in check_inode() check for inodes with a nonzero bi_parent_subvol field that aren't actually subvolume roots Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-13 21:22:23 -04:00
Kent Overstreet	971a1503a2	bcachefs: better log message in lookup_inode_for_snapshot() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-13 21:22:23 -04:00
Kent Overstreet	0b17618fdc	bcachefs: check_inode_dirent_inode() check that if an inode has a backpointer, the dirent it points to points back to it. We do this in check_dirent_inode_dirent(), but only for inodes that have dirents that point to them - we also have to do the check starting from the inode to catch inodes that don't have dirents that point to them. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-13 21:22:23 -04:00
Kent Overstreet	f2b02d099c	bcachefs: Check subvol <-> inode pointers in check_inode() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-13 21:22:23 -04:00
Kent Overstreet	4c20278eb1	bcachefs: Check subvol <-> inode pointers in check_subvol() Subvolumes and subvolume root inodes point to each other: this verifies the subvolume -> inode -> subvolme path. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-13 21:22:23 -04:00
Kent Overstreet	52946d828a	bcachefs: Kill more -EIO error codes This converts -EIOs related to btree node errors to private error codes, which will help with some ongoing debugging by giving us better error messages. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-13 21:22:23 -04:00
Kent Overstreet	da23795e4c	bcachefs: thread_with_file: add f_ops.flush Add a flush op, to return the exit code via close(). Also update bcachefs usage to use this to return fsck exit codes. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-13 21:22:20 -04:00
Kent Overstreet	6b33312925	bcachefs: thread_with_file: Fix missing va_end() Fixes: https://lore.kernel.org/linux-bcachefs/202402131603.E953E2CF@keescook/T/#u Reported-by: coverity scan Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-13 21:22:18 -04:00
Darrick J. Wong	658a1e42ce	bcachefs: thread_with_file: allow ioctls against these files Make it so that a thread_with_stdio user can handle ioctls against the file descriptor. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-13 21:22:15 -04:00
Darrick J. Wong	ab6752e24e	bcachefs: thread_with_file: create ops structure for thread_with_stdio Create an ops structure so we can add more file-based functionality in the next few patches. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-13 21:22:13 -04:00
Darrick J. Wong	1cbae651e5	bcachefs: thread_with_file: fix various printf problems Experimentally fix some problems with stdio_redirect_vprintf by creating a MOO variant with which we can experiment. We can't do a GFP_KERNEL allocation while holding the spinlock, and I don't like how the printf function can silently truncate the output if memory allocation fails. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-13 21:22:10 -04:00
Darrick J. Wong	fcb1620edd	bcachefs: thread_with_file: allow creation of readonly files Create a new run_thread_with_stdout function that opens a file in O_RDONLY mode so that the kernel can write things to userspace but userspace cannot write to the kernel. This will be used to convey xfs health event information to userspace. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-13 21:22:08 -04:00
Kent Overstreet	a5a650d647	bcachefs: thread_with_stdio: suppress hung task warning Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-13 21:22:04 -04:00
Kent Overstreet	8f9320d3a3	bcachefs: thread_with_stdio: Mark completed in ->release() This fixes stdio_redirect_read() getting stuck, not noticing that the pipe has been closed. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-13 21:22:00 -04:00
Kent Overstreet	032b3fd057	bcachefs: Thread with file documentation Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-13 18:39:13 -04:00
Kent Overstreet	f704f108af	bcachefs: thread_with_stdio: fix bch2_stdio_redirect_readline() This fixes a bug where we'd return data without waiting for a newline, if data was present but a newline was not. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-13 18:39:13 -04:00
Kent Overstreet	a6777ca4ff	bcachefs: thread_with_stdio: kill thread_with_stdio_done() Move the cleanup code to a wrapper function, where we can call it after the thread_with_stdio fn exits. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-13 18:39:13 -04:00
Kent Overstreet	60e1baa872	bcachefs: thread_with_stdio: convert to darray - eliminate the dependency on printbufs, so that we can lift thread_with_file for use in xfs - add a nonblocking parameter to stdio_redirect_printf(), and either block if the buffer is full or drop it on the floor - don't buffer infinitely Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-13 18:39:13 -04:00
Kent Overstreet	e017047fdb	bcachefs: thread_with_stdio: eliminate double buffering The output buffer lock has to be a spinlock so that we can write to it from interrupt context, so we can't use a direct copy_to_user; this switches thread_with_file_read() to use fault_in_writeable() and copy_to_user_nofault(), similar to how thread_with_file_write() works. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-13 18:39:13 -04:00
Kent Overstreet	cb6fc943b6	bcachefs: kill kvpmalloc() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-13 18:39:12 -04:00
Linus Torvalds	910202f00a	vfs-6.9.super -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZem4DwAKCRCRxhvAZXjc ooTRAQDRI6Qz6wJym5Yblta8BScMGbt/SgrdgkoCvT6y83MtqwD+Nv/AZQzi3A3l 9NdULtniW1reuCYkc8R7dYM8S+yAwAc= =Y1qX -----END PGP SIGNATURE----- Merge tag 'vfs-6.9.super' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull block handle updates from Christian Brauner: "Last cycle we changed opening of block devices, and opening a block device would return a bdev_handle. This allowed us to implement support for restricting and forbidding writes to mounted block devices. It was accompanied by converting and adding helpers to operate on bdev_handles instead of plain block devices. That was already a good step forward but ultimately it isn't necessary to have special purpose helpers for opening block devices internally that return a bdev_handle. Fundamentally, opening a block device internally should just be equivalent to opening files. So now all internal opens of block devices return files just as a userspace open would. Instead of introducing a separate indirection into bdev_open_by_() via struct bdev_handle bdev_file_open_by_() is made to just return a struct file. Opening and closing a block device just becomes equivalent to opening and closing a file. This all works well because internally we already have a pseudo fs for block devices and so opening block devices is simple. There's a few places where we needed to be careful such as during boot when the kernel is supposed to mount the rootfs directly without init doing it. Here we need to take care to ensure that we flush out any asynchronous file close. That's what we already do for opening, unpacking, and closing the initramfs. So nothing new here. The equivalence of opening and closing block devices to regular files is a win in and of itself. But it also has various other advantages. We can remove struct bdev_handle completely. Various low-level helpers are now private to the block layer. Other helpers were simply removable completely. A follow-up series that is already reviewed build on this and makes it possible to remove bdev->bd_inode and allows various clean ups of the buffer head code as well. All places where we stashed a bdev_handle now just stash a file and use simple accessors to get to the actual block device which was already the case for bdev_handle" * tag 'vfs-6.9.super' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (35 commits) block: remove bdev_handle completely block: don't rely on BLK_OPEN_RESTRICT_WRITES when yielding write access bdev: remove bdev pointer from struct bdev_handle bdev: make struct bdev_handle private to the block layer bdev: make bdev_{release, open_by_dev}() private to block layer bdev: remove bdev_open_by_path() reiserfs: port block device access to file ocfs2: port block device access to file nfs: port block device access to files jfs: port block device access to file f2fs: port block device access to files ext4: port block device access to file erofs: port device access to file btrfs: port device access to file bcachefs: port block device access to file target: port block device access to file s390: port block device access to file nvme: port block device access to file block2mtd: port device access to files bcache: port block device access to files ...	2024-03-11 10:52:34 -07:00
Kent Overstreet	737cd174d1	bcachefs: bch2_lookup() gives better error message on inode not found When a dirent points to a missing inode, we really should print out the dirent. This requires quite a bit of refactoring, but there's some other benefits: we now do the entire looup (dirent and inode) in a single btree transaction, and copy to the VFS inode with btree locks still held, like the create path. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-10 15:34:09 -04:00
Kent Overstreet	a91bc5e505	bcachefs: bch2_inode_insert() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-10 15:34:09 -04:00
Kent Overstreet	3d4998c202	bcachefs: factor out check_inode_backpointer() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-10 15:34:09 -04:00
Kent Overstreet	11def1888f	bcachefs: Factor out check_subvol_dirent() Going to be adding more code here for checking subvol structure. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-10 15:34:09 -04:00
Kent Overstreet	ce3e9283de	bcachefs: Kill some -EINVALs Repurposing standard error codes in bcachefs code is banned in new code, and we need to get rid of the remaining ones - private error codes give us much better error messages. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-10 15:34:09 -04:00
Kent Overstreet	82fdc1dc98	bcachefs: bump max_active on btree_interior_update_worker WQ_UNBOUND with max_active 1 means ordered workqueue, but we don't actually need or want ordered semantics - and probably want a higher concurrency limit anyways. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-10 15:34:09 -04:00
Kent Overstreet	69c8e6ce02	bcachefs: move fsck_write_inode() to inode.c Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-10 15:34:09 -04:00
Kent Overstreet	29223b5a55	bcachefs: Initialize super_block->s_uuid Need to fix this oversight for the new FS_IOC_(GET\|SET)UUID ioctls. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-10 15:34:09 -04:00
Kent Overstreet	f8f8fb443b	bcachefs: Switch to uuid_to_fsid() switch the statfs code from something horrible and open coded to the more standard uuid_to_fsid() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-10 15:34:09 -04:00
Kent Overstreet	7f76b08aca	bcachefs: Subvolumes may now be renamed Files within a subvolume cannot be renamed into another subvolume, but subvolumes themselves were intended to be. This implements subvolume renaming - we need to ensure that there's only a single dirent that points to a subvolume key (not multiple versions in different snapshots), and we need to ensure that dirent.d_parent_subol and inode.bi_parent_subvol are updated. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-10 15:34:08 -04:00
Kent Overstreet	5f43b0134e	bcachefs: btree node prefetching in check_topology btree_and_journal_iter is old code that we want to get rid of, but we're not ready to yet. lack of btree node prefetching is, it turns out, a real performance issue for fsck on spinning rust, so - add it. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-10 15:34:08 -04:00
Kent Overstreet	fc634d8e46	bcachefs: btree_and_journal_iter.trans we now always have a btree_trans when using a btree_and_journal_iter; prep work for adding prefetching to btree_and_journal_iter Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-10 15:34:08 -04:00
Kent Overstreet	916abefd43	bcachefs: better journal pipelining Recently a severe performance regression was discovered, which bisected to `a6548c8b5e` bcachefs: Avoid flushing the journal in the discard path It turns out the old behaviour, which issued excessive journal flushes, worked around a performance issue where queueing delays would cause the journal to not be able to write quickly enough and stall. The journal flushes masked the issue because they periodically flushed the device write cache, reducing write latency for non flushes. This patch reworks the journalling code to allow more than one (non-flush) write to be in flight at a time. With this patch, doing 4k random writes and an iodepth of 128, we are now able to hit 560k iops to a Samsung 970 EVO Plus - previously, we were stuck in the ~200k range. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-10 15:34:08 -04:00
Kent Overstreet	38789c2508	bcachefs: closure per journal buf Prep work for having multiple journal writes in flight. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-10 15:34:08 -04:00
Kent Overstreet	5165400275	bcachefs: bio per journal buf Prep work for having multiple journal writes in flight. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-10 15:34:08 -04:00
Kent Overstreet	52f7d75e7d	bcachefs: jset_entry_datetime This gives us a way to record the date and time every journal entry was written - useful for debugging. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-10 15:34:08 -04:00
Kent Overstreet	3d3d23b341	bcachefs: improve journal entry read fsck error messages Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-10 15:34:08 -04:00
Kent Overstreet	a555bcf4fa	bcachefs: convert journal replay ptrs to darray Eliminates some error paths - no longer have a hardcoded BCH_REPLICAS_MAX limit. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-10 15:34:08 -04:00
Kent Overstreet	5b6271b509	bcachefs: Cleanup bch2_dirent_lookup_trans() Drop an unnecessary bch2_subvolume_get_snapshot() call, and drop the __ from the name - this is a normal interface. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-10 15:34:08 -04:00
Kent Overstreet	23f2522315	bcachefs: bch2_hash_set_snapshot() -> bch2_hash_set_in_snapshot() Minor renaming for clarity, bit of refactoring. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-10 15:34:08 -04:00
Kent Overstreet	6b83aee8a4	bcachefs: Workqueues should be WQ_HIGHPRI Most bcachefs workqueues are used for completions, and should be WQ_HIGHPRI - this helps reduce queuing delays, we want to complete quickly once we can no longer signal backpressure by blocking. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-10 15:34:08 -04:00
Kent Overstreet	3f305e0498	bcachefs: Improve bch2_dirent_to_text() For DT_SUBVOL, we now print both parent and child subvol IDs. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-10 15:34:08 -04:00
Kent Overstreet	7b05ecbafc	bcachefs: fixup for building in userspace Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-10 15:34:08 -04:00
Kent Overstreet	e6fab655e6	bcachefs: Avoid taking journal lock unnecessarily Previously, any time we failed to get a journal reservation we'd retry, with the journal lock held; but this isn't necessary given wait_event()/wake_up() ordering. This avoids performance cliffs when the journal starts to get backed up and lock contention shoots up. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-10 15:34:08 -04:00
Kent Overstreet	bdec47f57f	bcachefs: Journal writes should be REQ_SYNC\|REQ_META Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-10 15:34:08 -04:00
Kent Overstreet	a4e9233911	bcachefs: Avoid setting j->write_work unnecessarily Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-10 15:34:08 -04:00
Kent Overstreet	656f05d8bd	bcachefs: Split out journal workqueue We don't want journal write completions to be blocked behind btree transactions - io_complete_wq is used for btree updates after data and metadata writes. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-10 15:34:08 -04:00
Kent Overstreet	4f70176cb9	bcachefs: Kill unnecessary wakeups in journal reclaim Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-10 15:34:08 -04:00
Guoyu Ou	0be5b38bce	bcachefs: skip invisible entries in empty subvolume checking When we are checking whether a subvolume is empty in the specified snapshot, entries that do not belong to this subvolume should be skipped. This fixes the following case: $ bcachefs subvolume create ./sub $ cd sub $ bcachefs subvolume create ./sub2 $ bcachefs subvolume snapshot . ./snap $ ls -a snap . .. $ rmdir snap rmdir: failed to remove 'snap': Directory not empty As Kent suggested, we pass 0 in may_delete_deleted_inode() to ignore subvols in the subvol we are checking, because inode.bi_subvol is only set on subvolume roots, and we can't go through every inode in the subvolume and change bi_subvol when taking a snapshot. It makes the check less strict, but that's ok, the rest of fsck will still catch it. Signed-off-by: Guoyu Ou <benogy@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-10 15:34:07 -04:00
Kent Overstreet	067f244c9e	bcachefs: fix split brain message Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-10 15:30:56 -04:00
Kent Overstreet	fadc6067f2	bcachefs: Set path->uptodate when no node at level We were failing to set path->uptodate when reaching the end of a btree node iterator, causing the new prefetch code for backpointers gc to go into an infinite loop. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-10 15:30:56 -04:00
Kent Overstreet	94817db956	bcachefs: Correctly validate k->u64s in btree node read path validate_bset_keys() never properly validated k->u64s; it checked if it was 0, but not if it was smaller than keys for the given packed format; this fixes that small oversight. This patch was backported, so it's adding quite a few error enums so that they don't get renumbered and we don't have confusing gaps. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-10 15:21:04 -04:00
Kent Overstreet	b3eba6a4a7	bcachefs: Fix degraded mode fsck We don't know where the superblock and journal lives on offline devices; that means if a device is offline fsck can't check those buckets. Previously, fsck would incorrectly clear bucket data types for those buckets on offline devices; now we just use the previous state. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-10 15:18:45 -04:00
Kent Overstreet	ba89083e9f	bcachefs: Fix journal replay with unreadable btree roots When a btree root is unreadable, we still might be able to get some data back by replaying what's in the journal. Previously though, we got confused when journal replay would attempt to replay a key for a level that didn't exist. This adds bch2_btree_increase_depth(), so that journal replay can handle this. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-10 15:18:13 -04:00
Kent Overstreet	52f3a72fa7	bcachefs: fix check_inode_deleted_list() check_inode_deleted_list() returns true if the inode is on the deleted list; check_inode() was checking the return code incorrectly. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-10 15:17:00 -04:00
Kent Overstreet	2f300f09c7	bcachefs: no_splitbrain_check option This adds an option to disable kicking out devices when splitbrain is detected - it seems there's some issues with splitbrain detection and we're kicking out devices erronously. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-10 15:12:54 -04:00
Kent Overstreet	88005d5dfb	bcachefs: extent_entry_next_safe() We need to be able to iterate over extent ptrs that may be corrupted in order to print them - this fixes a bug where we'd pop an assert in bch2_bkey_durability_safe(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-10 15:12:13 -04:00
Kent Overstreet	6fa30fe7f7	bcachefs: journal_seq_blacklist_add() now handles entries being added out of order bch2_journal_seq_blacklist_add() was bugged when the new entry overlapped with multiple existing entries, and it also assumed new entries are being added in increasing order. This is true on any sane filesystem, but when trying to recover from very badly mangled filesystems we might end up with the journal sequence number rewinding vs. what the blacklist list knows about - easiest to just handle that here. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-10 15:09:59 -04:00
Li Zetao	f8cdf65b51	bcachefs: Fix null-ptr-deref in bch2_fs_alloc() There is a null-ptr-deref issue reported by kasan: KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007] Call Trace: <TASK> bch2_fs_alloc+0x1092/0x2170 [bcachefs] bch2_fs_open+0x683/0xe10 [bcachefs] ... When initializing the name of bch_fs, it needs to dynamically alloc memory to meet the length of the name. However, when name allocation failed, it will cause a null-ptr-deref access exception in subsequent string copy. Fix this issue by checking if name allocation is successful. Fixes: `401ec4db63` ("bcachefs: Printbuf rework") Signed-off-by: Li Zetao <lizetao1@huawei.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-10 15:09:59 -04:00
Kent Overstreet	5197728f81	bcachefs: fix bch2_save_backtrace() Missed a call in the previous fix. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-02-25 15:45:36 -05:00
Christian Brauner	9f2f767f5e	bcachefs: port block device access to file Link: https://lore.kernel.org/r/20240123-vfs-bdev-file-v2-18-adbd023e19cc@kernel.org Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-02-25 12:05:25 +01:00
Kent Overstreet	c4333eb541	bcachefs: Fix check_snapshot() memcpy check_snapshot() copies the bch_snapshot to a temporary to easily handle older versions that don't have all the fields of the current version, but it lacked a min() to correctly handle keys newer and larger than the current version. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-02-24 20:47:47 -05:00
Kent Overstreet	097471f9e4	bcachefs: Fix bch2_journal_flush_device_pins() If a journal write errored, the list of devices it was written to could be empty - we're not supposed to mark an empty replicas list. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-02-24 20:46:48 -05:00
Brian Foster	b58b1b883b	bcachefs: fix iov_iter count underflow on sub-block dio read bch2_direct_IO_read() checks the request offset and size for sector alignment and then falls through to a couple calculations to shrink the size of the request based on the inode size. The problem is that these checks round up to the fs block size, which runs the risk of underflowing iter->count if the block size happens to be large enough. This is triggered by fstest generic/361 with a 4k block size, which subsequently leads to a crash. To avoid this crash, check that the shorten length doesn't exceed the overall length of the iter. Fixes: Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Su Yue <glass.su@suse.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-02-24 20:45:24 -05:00
Kent Overstreet	204f45140f	bcachefs: Fix BTREE_ITER_FILTER_SNAPSHOTS on inodes btree If we're in FILTER_SNAPSHOTS mode and we start scanning a range of the keyspace where no keys are visible in the current snapshot, we have a problem - we'll scan for a very long time before scanning terminates. Awhile back, this was fixed for most cases with peek_upto() (and assertions that enforce that it's being used). But the fix missed the fact that the inodes btree is different - every key offset is in a different snapshot tree, not just the inode field. Fixes: Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-02-24 20:41:46 -05:00
Kent Overstreet	04fee68dd9	bcachefs: Kill __GFP_NOFAIL in buffered read path Recently, we fixed our __GFP_NOFAIL usage in the readahead path, but the easy one in read_single_folio() (where wa can return an error) was missed - oops. Fixes: Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-02-24 20:41:42 -05:00
Kent Overstreet	1f626223a0	bcachefs: fix backpointer_to_text() when dev does not exist Fixes: Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-02-24 20:41:37 -05:00

1 2 3 4 5 ...

3361 Commits