linux

iv/linux

Author	SHA1	Message	Date
Kent Overstreet	bf5a261c7a	bcachefs: Assorted fixes for clang clang had a few more warnings about enum conversion, and also didn't like the opts.c initializer. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:09 -04:00
Kent Overstreet	7904c82cea	bcachefs: Move fsck_inode_rm() to inode.c Prep work for the new deleted inodes btree Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:09 -04:00
Kent Overstreet	e8d2fe3b4b	bcachefs: Consolidate btree id properties This refactoring centralizes defining per-btree properties. bch2_key_types_allowed was also about to overflow a u32, so expand that to a u64. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:09 -04:00
Kent Overstreet	85beefefd2	bcachefs: bch2_trans_update_extent_overwrite() Factor out a new helper, to be used when fsck has to repair overlapping extents. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:08 -04:00
Kent Overstreet	3f4ab4c1e6	bcachefs: Fix minor memory leak on invalid bkey Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:08 -04:00
Kent Overstreet	ac319b4f89	bcachefs: Move some declarations to the correct header Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:08 -04:00
Kent Overstreet	4437590d5f	bcachefs: Fix btree iter leak in __bch2_insert_snapshot_whiteouts() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:08 -04:00
Kent Overstreet	2a89a3e968	bcachefs: Fix a null ptr deref in check_xattr() We were attempting to initialize inode hash info when no inodes were found. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:08 -04:00
Kent Overstreet	8e992c6c1f	bcachefs: bch2_btree_bit_mod() New helper for bitset btrees. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:08 -04:00
Kent Overstreet	4dc5bb9adf	bcachefs: move inode triggers to inode.c bit of reorg Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:08 -04:00
Kent Overstreet	9d8a3c95dc	bcachefs: fsck: delete dead code Delete the old, now reimplemented overlapping extent check/repair. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:08 -04:00
Kent Overstreet	922bc5a037	bcachefs: Make topology repair a normal recovery pass This adds bch2_run_explicit_recovery_pass(), for rewinding recovery and explicitly running a specific recovery pass - this is a more general replacement for how we were running topology repair before. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:08 -04:00
Kent Overstreet	ae2e13d780	bcachefs: bch2_run_explicit_recovery_pass() This introduces bch2_run_explicit_recovery_pass() and uses it for when fsck detects that we need to re-run dead snaphots cleanup, and makes dead snapshot cleanup more like a normal recovery pass. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:08 -04:00
Kent Overstreet	ef1634f0f1	bcachefs: Print version, options earlier in startup path Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:08 -04:00
Brian Foster	60a5b89800	bcachefs: use prejournaled key updates for write buffer flushes The write buffer mechanism journals keys twice in certain situations. A key is always journaled on write buffer insertion, and is potentially journaled again if a write buffer flush falls into either of the slow btree insert paths. This has shown to cause journal recovery ordering problems in the event of an untimely crash. For example, consider if a key is inserted into index 0 of a write buffer, the active write buffer switches to index 1, the key is deleted in index 1, and then index 0 is flushed. If the original key is rejournaled in the btree update from the index 0 flush, the (now deleted) key is journaled in a seq buffer ahead of the latest version of key (which was journaled when the key was deleted in index 1). If the fs crashes while this is still observable in the log, recovery sees the key from the btree update after the delete key from the write buffer insert, which is the incorrect order. This problem is occasionally reproduced by generic/388 and generally manifests as one or more backpointer entry inconsistencies. To avoid this problem, never rejournal write buffered key updates to the associated btree. Instead, use prejournaled key updates to pass the journal seq of the write buffer insert down to the btree insert, which updates the btree leaf pin to reflect the seq of the key. Note that tracking the seq is required instead of just using NOJOURNAL here because otherwise we lose protection of the write buffer pin when the buffer is flushed, which means the key can fall off the tail of the on-disk journal before the btree leaf is flushed and lead to similar recovery inconsistencies. Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:08 -04:00
Brian Foster	eabb10dc95	bcachefs: support btree updates of prejournaled keys Introduce support for prejournaled key updates. This allows a transaction to commit an update for a key that already exists (and is pinned) in the journal. This is required for btree write buffer updates as the current scheme of journaling both on write buffer insertion and write buffer (slow path) flush is unsafe in certain crash recovery scenarios. Create a small trans update wrapper to pass along the seq where the key resides into the btree_insert_entry. From there, trans commit passes the seq into the btree insert path where it is used to manage the journal pin for the associated btree leaf. Note that this patch only introduces the underlying mechanism and otherwise includes no functional changes. Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:08 -04:00
Brian Foster	78623ee0d0	bcachefs: fold bch2_trans_update_by_path_trace() into callers There is only one other caller so eliminate some boilerplate. Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:08 -04:00
Brian Foster	a2437bba05	bcachefs: remove unnecessary btree_insert_key_leaf() wrapper This is in preparation to support prejournaled keys. We want the ability to optionally pass a seq stored in the btree update rather than the seq of the committing transaction. Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:08 -04:00
Brian Foster	2110f21ec0	bcachefs: remove duplicate code between backpointer update paths Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:08 -04:00
Brian Foster	f7b3e651de	MAINTAINERS: add Brian Foster as a reviewer for bcachefs Brian has been playing with bcachefs for several months now and has offerred to commit time to patch review. Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:08 -04:00
Kent Overstreet	970a5096ac	bcachefs: Suppresss various error messages in no_data_io mode We commonly use no_data_io mode when debugging filesystem metadata dumps, where data checksum/compression errors are expected and unimportant - this patch suppresses these. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:08 -04:00
Kent Overstreet	20e6d9a8d4	bcachefs: Fix lookup_inode_for_snapshot() This fixes a use-after-free. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:08 -04:00
Kent Overstreet	6b20d746ad	bcachefs: need_snapshot_cleanup shouldn't be a fsck error We currently don't track whether snapshot cleanup still needs to finish (aside from running a full fsck), so it shouldn't be a fsck error yet - fsck -n after fsck has succesfully completed shouldn't error. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:08 -04:00
Kent Overstreet	464ee1929b	bcachefs: Improve key_visible_in_snapshot() Delete a redundant bch2_snapshot_is_ancestor() check, and convert some assertions to debug assertions. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:08 -04:00
Kent Overstreet	a397b8df5e	bcachefs: Refactor overlapping extent checks Make the overlapping extent check/repair code more self contained. This is prep work for hopefully reducing key_visible_in_snapshot() usage here as well, and also includes a nice performance optimization to not check ref_visible2() unless the extents potentially overlap. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:08 -04:00
Kent Overstreet	a0076086da	bcachefs: check_extent(): don't use key_visible_in_snapshot() This changes the main part of check_extents(), that checks the extent against the corresponding inode, to not use key_visible_in_snapshot(). key_visible_in_snapshot() has to iterate over the list of ancestor overwrites repeatedly calling bch2_snapshot_is_ancestor(), so this is a significant performance improvement. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:07 -04:00
Kent Overstreet	650eb16b45	bcachefs: check_extent() refactoring More prep work for reducing key_visible_in_snapshot() usage - this rearranges how KEY_TYPE_whitout keys are handled, so that they can be marked off in inode_warker->inode->seen_this_pos. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:07 -04:00
Kent Overstreet	a57f4d6113	bcachefs: fsck: walk_inode() now takes is_whiteout We only want to synthesize an inode for the current snapshot ID for non whiteouts - this refactoring lets us call walk_inode() earlier and clean up some control flow. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:07 -04:00
Kent Overstreet	0d8f320dd7	bcachefs: Simplify check_extent() Minor refactoring/dead code deletion, prep work for reworking check_extent() to avoid key_visible_in_snapshot(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:07 -04:00
Kent Overstreet	43b81a4eac	bcachefs: overlapping_extents_found() This improves the repair path for overlapping extents - we now verify that we find in the btree the overlapping extents that the algorithm detected, and fail the fsck run with a more useful error if it doesn't match. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:07 -04:00
Kent Overstreet	f9f52bc4f0	bcachefs: fsck: inode_walker: last_pos, seen_this_pos Prep work for changing check_extent() to avoid key_visible_in_snapshot() - this adds the state to track whether an inode has seen an extent at this pos. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:07 -04:00
Kent Overstreet	5897505e67	bcachefs: check_extents(): make sure to check i_sectors for last inode Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:07 -04:00
Kent Overstreet	93de9e92c3	bcachefs: Inline bch2_snapshot_is_ancestor() fast path Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:07 -04:00
Kent Overstreet	813e0cecd1	bcachefs: Upgrade path fixes Some minor fixes to not print errors that are actually due to a verson upgrade. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:07 -04:00
Kent Overstreet	6132c84cac	bcachefs: is_ancestor bitmap Further optimization for bch2_snapshot_is_ancestor(). We add a small inline bitmap to snapshot_t, which indicates which of the next 128 snapshot IDs are ancestors of the current id - eliminating the last few iterations of the loop in bch2_snapshot_is_ancestor(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:07 -04:00
Mikulas Patocka	5eaa76d813	bcachefs: mark bch_inode_info and bkey_cached as reclaimable Mark these caches as reclaimable, so that available memory is correctly reported when there is a lot of cached inodes. Note that more work is needed - you should add __GFP_RECLAIMABLE to some of the kmalloc calls, so that they are allocated from the "kmalloc-rcl-*" caches. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:07 -04:00
Kent Overstreet	986e9842fb	bcachefs: Compression levels This allows including a compression level when specifying a compression type, e.g. compression=zstd:15 Values from 1 through 15 indicate compression levels, 0 or unspecified indicates the default. For LZ4, values 3-15 specify that the HC algorithm should be used. Note that for compatibility, extents themselves only include the compression type, not the compression level. This means that specifying the same compression algorithm but different compression levels for the compression and background_compression options will have no effect. XXX: perhaps we could add a warning for this Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:07 -04:00
Kent Overstreet	e86e9124ca	bcachefs: Extent sb compression type fields to 8 bits The upper 4 bits are for compression level. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:07 -04:00
Kent Overstreet	a5cf5a4b41	bcachefs: bcachefs_format.h should be using __u64 Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:07 -04:00
Kent Overstreet	a0f8faea5f	bcachefs: fix_errors option is now a proper enum Before, it was parsed as a bool but internally it was really an enum: this lets us pass in all the possible values. But we special case the option parsing: no supplied value is parsed as FSCK_FIX_yes, to match the previous behaviour. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:07 -04:00
Kent Overstreet	9f343e24f5	bcachefs: bch_opt_fn Minor refactoring to get rid of some unneeded token pasting. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:07 -04:00
Kent Overstreet	8479938d7a	bcachefs: Convert snapshot table to RCU array This switches the generic radix tree for the in-memory table of snapshot nodes to a simple rcu array. This means we have to add new locking to deal with reallocations, but is faster than traversing the radix tree. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:07 -04:00
Kent Overstreet	d82978ca15	bcachefs: Add a race_fault() for write buffer slowpath We haven't hooked up dynamic fault injection quite yet, but we will soon Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:07 -04:00
Kent Overstreet	f39d1aca4d	bcachefs: Add buffered IO fallback for userspace In userspace, we want to be able to switch to buffered IO when we're dealing with an image on a filesystem/device that doesn't support the blocksize the filesystem was formatted with. This plumbs through !opts.direct_io -> FMODE_BUFFERED, which will be supported by the shim version of blkdev_get_by_path() in -tools, and it adds a fallback to disable direct IO and retry for userspace. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:07 -04:00
Kent Overstreet	a09818c7e7	bcachefs: Fallocate now checks page cache Previously, fallocate would only check the state of the extents btree when determining if we need to create a reservation. But the page cache might already have dirty data or a disk reservation. This changes __bchfs_fallocate() to call bch2_seek_pagecache_hole() to check for this. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:07 -04:00
Kent Overstreet	ea28c86722	bcachefs: Don't start copygc until recovery is finished With "bcachefs: Snapshot depth, skiplist fields", we now can't run data move operations until after bch2_check_snapshots() is complete. Ideally we'd have the copygc (and rebalance) threads wait until c->curr_recovery_pass has advanced, but the waitlist handling is tricky - so for now, move starting copygc back to read_write_late(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:07 -04:00
Kent Overstreet	b912913613	bcachefs: Fix build error on weird gcc fixes ./include/linux/stddef.h:8:14: error: positional initialization of field in ‘struct’ declared with ‘designated_init’ attribute [-Werror=designated-init] Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:07 -04:00
Kent Overstreet	f26c67f4a7	bcachefs: Snapshot depth, skiplist fields This extents KEY_TYPE_snapshot to include some new fields: - depth, to indicate depth of this particular node from the root - skip[3], skiplist entries for quickly walking back up to the root These are to improve bch2_snapshot_is_ancestor(), making it O(ln(n)) instead of O(n) in the snapshot tree depth. Skiplist nodes are picked at random from the set of ancestor nodes, not some fixed fraction. This introduces bcachefs_metadata_version 1.1, snapshot_skiplists. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:06 -04:00
Kent Overstreet	065bd3356c	bcachefs: Version table now lists required recovery passes Now that we've got forward compatibility sorted out, we should be doing more frequent version upgrades in the future. To avoid having to run a full fsck for every version upgrade, this improves the BCH_METADATA_VERSIONS() table to explicitly specify a bitmask of recovery passes to run when upgrading to or past a given version. This means we can also delete PASS_UPGRADE(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:06 -04:00
Kent Overstreet	6619d84626	bcachefs: bch2_sb_maybe_downgrade(), bch2_sb_upgrade() Add some new helpers, and fix upgrade/downgrade in bch2_fs_initialize(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:06 -04:00

1 2 3 4 5 ...

1217739 Commits