linux

iv/linux

Author	SHA1	Message	Date
Kent Overstreet	7ee88737ab	bcachefs: Check for bad needs_discard before doing discard In the discard worker, we were failing to validate the bucket state - meaning a corrupt needs_discard btree could cause us to discard a bucket that we shouldn't. If check_alloc_info hasn't run yet we just want to bail out, otherwise it's a filesystem inconsistent error. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-04-02 20:24:00 -04:00
Kent Overstreet	3ed94062e3	bcachefs: Improve bch2_fatal_error() error messages should always include __func__ Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-18 00:24:24 -04:00
Kent Overstreet	1ba6f48f09	bcachefs: Fix nested transaction restart handling in bch2_bucket_gens_init() Nested transaction restart handling is typically best avoided; when the inner context handles a transaction restart it invalidates the outer transaction context, so we need to make sure to return a transaction_restart_nested error. This code wasn't doing that, and hit the assertion in for_each_btree_key() that checks for that via trans->restart_count. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-17 20:53:12 -04:00
Kent Overstreet	cdce109431	bcachefs: reconstruct_alloc cleanup Now that we've got the errors_silent mechanism, we don't have to check if the reconstruct_alloc option is set all over the place. Also - users no longer have to explicitly select fsck and fix_errors. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-13 21:22:26 -04:00
Kent Overstreet	a393f33123	bcachefs: Split out discard fastpath Buckets usually can't be discarded until the transaction that made them empty has been committed in the journal. Tracing has indicated that we're queuing the discard worker excessively, only for it to skip over many buckets that are still waiting on a journal commit, discarding only one or two buckets per iteration. We want to switch to only queuing the discard worker after a journal flush write, but there's an important optimization we need to preserve: if a bucket becomes empty and it was never committed in the journal while it was in use, we want to discard it and reuse it right away - since overwriting it before the previous writes are flushed from the device cache eans those writes only cost bus bandwidth. So, this patch implements a fast path for buckets that can be discarded right away. We need new locking between the two discard workers; the new list of buckets being discarded provides that locking. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-13 21:22:25 -04:00
Kent Overstreet	6e9d0558b1	bcachefs: bch2_trigger_alloc() handles state changes better bch2_trigger_alloc() kicks off certain tasks on bucket state changes; e.g. triggering the bucket discard worker and the invalidate worker. We've observed the discard worker running too often - most runs it doesn't do any work, according to the tracepoint - so clearly, we're kicking it off too often. This adds an explicit statechange() macro to make these checks more precise. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-03-13 21:22:24 -04:00
Kent Overstreet	096386a5bc	bcachefs: discard path uses unlock_long() Some (bad) devices can have really terrible discard latency; we don't want them blocking memory reclaim and causing warnings. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-24 17:27:46 -05:00
Kent Overstreet	a6548c8b5e	bcachefs: Avoid flushing the journal in the discard path When issuing discards, we may need to flush the journal if there's too many buckets that can't be discarded until a journal flush. But the heuristic was bad; we should be comparing the number of buckets that need to flushes against the number of free buckets, not the number of buckets we saw. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-21 13:27:09 -05:00
Kent Overstreet	e58f963cec	bcachefs: helpers for printing data types We need bounds checking since new versions may introduce new data types. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-21 06:01:45 -05:00
Kent Overstreet	38c23fb809	bcachefs: BTREE_TRIGGER_ATOMIC Add a new flag to be explicit about when we're running atomic triggers. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-21 06:01:45 -05:00
Kent Overstreet	9d5dba2ba8	bcachefs: drop to_text code for obsolete bps in alloc keys Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-21 06:01:45 -05:00
Kent Overstreet	074cbcdaee	bcachefs: fsck_err()s don't need to manually check c->sb.version anymore Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-05 23:24:21 -05:00
Kent Overstreet	153d1c63c2	bcachefs: unify alloc trigger Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-05 23:24:20 -05:00
Kent Overstreet	6820ac2cdc	bcachefs: move bch2_mark_alloc() to alloc_background.c Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-05 23:24:20 -05:00
Kent Overstreet	717296c34c	bcachefs: trans_mark now takes bkey_s Prep work for disk space accounting rewrite: we're going to want to use a single callback for both of our current triggers, so we need to change them to have the same type signature first. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-05 23:24:19 -05:00
Kent Overstreet	07f383c71f	bcachefs: btree_iter -> btree_path_idx_t Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:43 -05:00
Kent Overstreet	41b84fb489	bcachefs: for_each_member_device_rcu() now declares loop iter Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:42 -05:00
Kent Overstreet	9fea2274f7	bcachefs: for_each_member_device() now declares loop iter Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:42 -05:00
Kent Overstreet	80eab7a7c2	bcachefs: for_each_btree_key() now declares loop iter Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:42 -05:00
Kent Overstreet	3a860b5ad5	bcachefs: for_each_btree_key_upto() -> for_each_btree_key_old_upto() And for_each_btree_key2_upto -> for_each_btree_key_upto Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:42 -05:00
Kent Overstreet	cf904c8d96	bcachefs: bch_err_(fn\|msg) check if should print Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:41 -05:00
Kent Overstreet	5028b9078c	bcachefs: Rename for_each_btree_key2() -> for_each_btree_key() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:40 -05:00
Kent Overstreet	27b2df982f	bcachefs: Kill for_each_btree_key() for_each_btree_key() handles transaction restarts, like for_each_btree_key2(), but only calls bch2_trans_begin() after a transaction restart - for_each_btree_key2() wraps every loop iteration in a transaction. The for_each_btree_key() behaviour is problematic when it leads to holding the SRCU lock that prevents key cache reclaim for an unbounded amount of time - there's no real need to keep it around. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:40 -05:00
Kent Overstreet	3f0e297d86	bcachefs: Explicity go RW for fsck This eliminates a lot of BCH_TRANS_COMMIT_lazy_rw flags, and is less error prone. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:39 -05:00
Kent Overstreet	7464403009	bcachefs: count_event() Small helper for event counters. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:39 -05:00
Kent Overstreet	cb13f47139	bcachefs: bch2_btree_write_buffer_flush() -> bch2_btree_write_buffer_tryflush() More accurate naming. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:39 -05:00
Kent Overstreet	3f59547e22	bcachefs: Refactor bch2_check_alloc_to_lru_ref() This code was somewhat convoluted - because originally bch2_lru_set() could modify the LRU index if there was a collision. That's no longer the case, so the "create LRU entry" path has no reason to update the alloc key, so we can separate the handling of the two fsck errors. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:38 -05:00
Kent Overstreet	dafff7e575	bcachefs: New bucket sector count helpers This introduces bch2_bucket_sectors() and bch2_bucket_sectors_dirty(), prep work for separately accounting stripe sectors. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:38 -05:00
Kent Overstreet	25f64e997e	bcachefs: Don't use update_cached_sectors() in bch2_mark_alloc() bch2_update_cached_sectors_list() is closer to how the new disk space accounting works, called from trans_mark(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:38 -05:00
Kent Overstreet	cb52d23e77	bcachefs: Rename BTREE_INSERT flags BTREE_INSERT flags are actually transaction commit flags - rename them for clarity. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:37 -05:00
Kent Overstreet	7d9ae04e39	bcachefs: Fix locking when checking freespace btree On transaction restart, we weren't re-validating the hole we saw. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2024-01-01 11:47:36 -05:00
Brian Foster	7cb2a7895d	bcachefs: use swab40 for bch_backpointer.bucket_offset bitfield The bucket_offset field of bch_backpointer is a 40-bit bitfield, but the bch2_backpointer_swab() helper uses swab32. This leads to inconsistency when an on-disk fs is accessed from an opposite endian machine. As it turns out, we already have an internal swab40() helper that is used from the bch_alloc_v4 swab callback. Lift it into the backpointers header file and use it consistently in both places. Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-11-04 22:19:13 -04:00
Brian Foster	0996c72a0f	bcachefs: byte order swap bch_alloc_v4.fragmentation_lru field A simple test to populate a filesystem on one CPU architecture and fsck on an arch of the opposite byte order produces errors related to the fragmentation LRU. This occurs because the 64-bit fragmentation_lru field is not byte-order swapped when reads detect that the on-disk/bset key values were written in opposite byte-order of the current CPU. Update the bch2_alloc_v4 swab callback to handle fragmentation_lru as is done for other multi-byte fields. This doesn't affect existing filesystems when accessed by CPUs of the same endianness because the ->swab() callback is only called when the bset flags indicate an endianness mismatch between the CPU and on-disk data. Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-11-04 22:19:13 -04:00
Kent Overstreet	1f7056b735	bcachefs: Ensure copygc does not spin If copygc does no work - finds no fragmented buckets - wait for a bit of IO to happen. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-11-04 14:17:11 -04:00
Kent Overstreet	b65db750e2	bcachefs: Enumerate fsck errors This patch adds a superblock error counter for every distinct fsck error; this means that when analyzing filesystems out in the wild we'll be able to see what sorts of inconsistencies are being found and repair, and hence what bugs to look for. Errors validating bkeys are not yet considered distinct fsck errors, but this patch adds a new helper, bkey_fsck_err(), in order to add distinct error types for them as well. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-11-01 21:11:08 -04:00
Kent Overstreet	88dfe193bd	bcachefs: bch2_btree_id_str() Since we can run with unknown btree IDs, we can't directly index btree IDs into fixed size arrays. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-31 12:18:37 -04:00
Kent Overstreet	69d1f052d1	bcachefs: Correctly initialize new buckets on device resize bch2_dev_resize() was never updated for the allocator rewrite with persistent freelists, and it wasn't noticed because the tests weren't running fsck - oops. Fix this by running bch2_dev_freespace_init() for the new buckets. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:16 -04:00
Hunter Shaffer	3f7b9713da	bcachefs: New superblock section members_v2 members_v2 has dynamically resizable entries so that we can extend bch_member. The members can no longer be accessed with simple array indexing Instead members_v2_get is used to find a member's exact location within the array and returns a copy of that member. Alternatively member_v2_get_mut retrieves a mutable point to a member. Signed-off-by: Hunter Shaffer <huntershaffer182456@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:15 -04:00
Kent Overstreet	6bd68ec266	bcachefs: Heap allocate btree_trans We're using more stack than we'd like in a number of functions, and btree_trans is the biggest object that we stack allocate. But we have to do a heap allocatation to initialize it anyways, so there's no real downside to heap allocating the entire thing. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:13 -04:00
Kent Overstreet	96dea3d599	bcachefs: Fix W=12 build errors Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:13 -04:00
Nathan Chancellor	0940863fd2	bcachefs: Fix -Wformat in bch2_bucket_gens_invalid() When building bcachefs for 32-bit ARM, there is a compiler warning in bch2_bucket_gens_invalid() due to use of an incorrect format specifier: fs/bcachefs/alloc_background.c:530:10: error: format specifies type 'unsigned long' but the argument has type 'size_t' (aka 'unsigned int') [-Werror,-Wformat] 529 \| prt_printf(err, "bad val size (%lu != %zu)", \| ~~~ \| %zu 530 \| bkey_val_bytes(k.k), sizeof(struct bch_bucket_gens)); \| ^~~~~~~~~~~~~~~~~~~ fs/bcachefs/util.h:223:54: note: expanded from macro 'prt_printf' 223 \| #define prt_printf(_out, ...) bch2_prt_printf(_out, __VA_ARGS__) \| ^~~~~~~~~~~ On 64-bit architectures, size_t is 'unsigned long', so there is no warning when using %lu but on 32-bit architectures, size_t is 'unsigned int'. Use '%zu', the format specifier for 'size_t', to eliminate the warning. Fixes: 4be0d766a7e9 ("bcachefs: bucket_gens btree") Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:13 -04:00
Nathan Chancellor	14f63ff3f6	bcachefs: Fix -Wformat in bch2_alloc_v4_invalid() When building bcachefs for 32-bit ARM, there is a compiler warning in bch2_alloc_v4_invalid() due to use of an incorrect format specifier: fs/bcachefs/alloc_background.c:246:30: error: format specifies type 'unsigned long' but the argument has type 'unsigned int' [-Werror,-Wformat] 245 \| prt_printf(err, "bad val size (%u > %lu)", \| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ \| %u 246 \| alloc_v4_u64s(a.v), bkey_val_u64s(k.k)); \| ~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~ fs/bcachefs/bkey.h:58:27: note: expanded from macro 'bkey_val_u64s' 58 \| #define bkey_val_u64s(_k) ((_k)->u64s - BKEY_U64s) \| ^ fs/bcachefs/util.h:223:54: note: expanded from macro 'prt_printf' 223 \| #define prt_printf(_out, ...) bch2_prt_printf(_out, __VA_ARGS__) \| ^~~~~~~~~~~ This expression is of type 'size_t'. On 64-bit architectures, size_t is 'unsigned long', so there is no warning when using %lu but on 32-bit architectures, size_t is 'unsigned int'. Use '%zu', the format specifier for 'size_t' to eliminate the warning. Fixes: 11be8e8db283 ("bcachefs: New on disk format: Backpointers") Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:13 -04:00
Kent Overstreet	aef32bf7cc	bcachefs: __bch2_btree_insert() -> bch2_btree_insert_trans() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:12 -04:00
Kent Overstreet	e46c181af9	bcachefs: Convert more code to bch_err_msg() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:12 -04:00
Kent Overstreet	cba37d81f5	bcachefs: Kill stripe check in bch2_alloc_v4_invalid() Since we set bucket data type to BCH_DATA_stripe based on the data pointer, not just the stripe pointer, it doesn't make sense to check for no stripe in the .key_invalid method - this is a situation that shouldn't happen, but our other fsck/repair code handles it. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:12 -04:00
Kent Overstreet	71aba59029	bcachefs: Always check alloc data type Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:11 -04:00
Kent Overstreet	bf5a261c7a	bcachefs: Assorted fixes for clang clang had a few more warnings about enum conversion, and also didn't like the opts.c initializer. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:09 -04:00
Kent Overstreet	067d228bb0	bcachefs: Enumerate recovery passes Recovery and fsck have many different passes/jobs to do, which always run in the same order - but not all of them run all the time. Some are for fsck, some for unclean shutdown, some for version upgrades. This adds some new structure: a defined list of recovery passes that we can run in a loop, as well as consolidating the log messages. The main benefit is consolidating the "should run this recovery pass" logic, as well as cleaning up the "this recovery pass has finished" state; instead of having a bunch of ad-hoc state bits in c->flags, we've now got c->curr_recovery_pass. By consolidating the "should run this recovery pass" logic, in the future on disk format upgrades will be able to say "upgrading to this version requires x passes to run", instead of forcing all of fsck to run. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:06 -04:00
Kent Overstreet	10a6ced2da	bcachefs: Kill bch2_bucket_gens_read() This folds bch2_bucket_gens_read() into bch2_alloc_read(), doing the version check there. This is prep work for enumarating all recovery passes: we need some cleanup first to make calling all the recovery passes consistent. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:06 -04:00
Kent Overstreet	24964e1c5c	bcachefs: BCH_SB_VERSION_UPGRADE_COMPLETE() Version upgrades are not atomic operations: when we do a version upgrade we need to update the superblock before we start using new features, and then when the upgrade completes we need to update the superblock again. This adds a new superblock field so we can detect and handle incomplete version upgrades. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:06 -04:00

1 2 3 4 5 ...

268 Commits