linux

iv/linux

Author	SHA1	Message	Date
Kent Overstreet	57c723de7d	bcachefs: Rework __bch2_data_update_index_update() This makes some improvements to the logic for adding/removing replicas, as part of the larger erasure coding improvements. We now directly consider number of replicas desired for the given inode, and extent/pointer durability: this ensures that the extent ends up with the desired number of replicas when we're replacing multiple pointers with one that has higher durability (e.g. erasure coded). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:56 -04:00
Kent Overstreet	702ffea204	bcachefs: Extent helper improvements - __bch2_bkey_drop_ptr() -> bch2_bkey_drop_ptr_noerror(), now available outside extents. - Split bch2_bkey_has_device() and bch2_bkey_has_device_c(), const and non const versions - bch2_extent_has_ptr() now returns the pointer it found Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:56 -04:00
Kent Overstreet	2f528663c5	bcachefs: moving_context->stats is allowed to be NULL Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:55 -04:00
Kent Overstreet	11bb67a4a3	bcachefs: bch2_data_update_init() considers ptr durability Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:55 -04:00
Kent Overstreet	039c45feef	bcachefs: bch2_data_update_index_update() -> bch2_trans_run() Convert to use the standard helper Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:54 -04:00
Kent Overstreet	33669e0cc9	bcachefs: Add option for completely disabling nocow This adds an option for completely disabling nocow mode, including the locking in the data move path. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:54 -04:00
Kent Overstreet	c9163bb03b	bcachefs: Cached pointers should not be erasure coded There's no reason to erasure code cached pointers: we'll always have another copy, and it'll be cheaper to read the other copy than do a reconstruct read. And erasure coded cached pointers would add complications that we'd rather not have to deal with, so let's make sure to disallow them. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:54 -04:00
Kent Overstreet	f2a53270c7	bcachefs: Fix insert_snapshot_whiteouts() - We were failing to set the key type on the whiteouts it was creating, oops. - Also, we need to create whiteouts when generating front splits, not just back splits. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:53 -04:00
Kent Overstreet	2798143aa8	bcachefs: bch2_btree_insert_nonextent() This adds a new helper to delete some redundant code in bch2_trans_update_extent(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:53 -04:00
Kent Overstreet	09d70d0be1	bcachefs: Nocow locking fixup Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:52 -04:00
Daniel Hill	3482dd6a25	bcachefs: don't block reads if we're promoting The promote path calls data_update_init() and now that we take locks here, there's potential for promote to block our read path, just error when we can't take the lock instead of blocking. Signed-off-by: Daniel Hill <daniel@gluo.nz> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:52 -04:00
Kent Overstreet	7ffb6a7ec6	bcachefs: Fix deadlock on nocow locks in data move path The recent nocow locking rework introduced a deadlock in the data move path: the new nocow locking scheme uses a hash table with a fixed size array for chaining, meaning on hash collision we may have to wait for other locks to be released before we can lock a bucket. And since the data move path needs to submit writes from the same thread that's taking nocow locks and submitting reads, this introduces a deadlock. This shouldn't happen often in practice, but since the data move path can keep large numbers of IOs in flight simultaneously, it's something we have to handle. This patch makes move_ctxt_wait_event() available to bch2_data_update_init() and uses it when appropriate, which is our normal solution to this kind of thing. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:52 -04:00
Kent Overstreet	350175bf9b	bcachefs: Improved nocow locking This improves the nocow lock table so that hash table entries have multiple locks, and locks specify which bucket they're for - i.e. we can now resolve hash collisions. This is important because the allocator has to skip buckets that are locked in the nocow lock table, and previously hash collisions would cause it to spuriously skip unlocked buckets. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:52 -04:00
Daniel Hill	f3a37e76ca	bcachefs: handle failed data_update_init cleanup data_update_init allocates several resources, but we forget to clean these up when it fails. Signed-off-by: Daniel Hill <daniel@gluo.nz> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:52 -04:00
Kent Overstreet	a8b3a677e7	bcachefs: Nocow support This adds support for nocow mode, where we do writes in-place when possible. Patch components: - New boolean filesystem and inode option, nocow: note that when nocow is enabled, data checksumming and compression are implicitly disabled - To prevent in-place writes from racing with data moves (data_update.c) or bucket reuse (i.e. a bucket being reused and re-allocated while a nocow write is in flight, we have a new locking mechanism. Buckets can be locked for either data update or data move, using a fixed size hash table of two_state_shared locks. We don't have any chaining, meaning updates and moves to different buckets that hash to the same lock will wait unnecessarily - we'll want to watch for this becoming an issue. - The allocator path also needs to check for in-place writes in flight to a given bucket before giving it out: thus we add another counter to bucket_alloc_state so we can track this. - Fsync now may need to issue cache flushes to block devices instead of flushing the journal. We add a device bitmask to bch_inode_info, ei_devs_need_flush, which tracks devices that need to have flushes issued - note that this will lead to unnecessary flushes when other codepaths have already issued flushes, we may want to replace this with a sequence number. - New nocow write path: look up extents, and if they're writable write to them - otherwise fall back to the normal COW write path. XXX: switch to sequence numbers instead of bitmask for devs needing journal flush XXX: ei_quota_lock being a mutex means bch2_nocow_write_done() needs to run in process context - see if we can improve this Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:51 -04:00
Kent Overstreet	4dcd1cae72	bcachefs: Data update support for unwritten extents The data update path requires special support for unwritten extents - we still need to be able to move them, but there's no need to read or write anything. This patch adds a new error code to tell bch2_move_extent() that we're short circuiting the read, and adds bch2_update_unwritten_extent() to create a reservation then call __bch2_data_update_index_update(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:51 -04:00
Kent Overstreet	b08b492ed3	bcachefs: Drop old maybe_extending optimization The extend update path had an optimization to avoid updating the inode if we knew we were definitely not extending the file. But now that we're updating inodes on every extent update - for fsync - that code can be deleted. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:51 -04:00
Kent Overstreet	c9828cea31	bcachefs: Delete in memory ec backpointers Post btree backpointers, these aren't needed anymore. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:51 -04:00
Kent Overstreet	d7dd3fb84f	bcachefs: Fix rereplicate when we already have a cached pointer When we need to add more replicas to an extent, it might be the case that we already have a replica on every device, but some of them are cached. This patch fixes a bug where we'd spin on that extent because the write path fails to find a device we can allocate from: we allow allocating from devices that already have cached replicas on them, and change bch2_data_update_index_update() to drop the cached replica if needed. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:49 -04:00
Kent Overstreet	e88a75ebe8	bcachefs: New bpos_cmp(), bkey_cmp() replacements This patch introduces - bpos_eq() - bpos_lt() - bpos_le() - bpos_gt() - bpos_ge() and equivalent replacements for bkey_cmp(). Looking at the generated assembly these could probably be improved further, but we already see a significant code size improvement with this patch. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:47 -04:00
Kent Overstreet	a1ee777bfc	bcachefs: Kill BCH_WRITE_FLUSH BCH_WRITE_FLUSH is a write flag that causes a journal flush. It's only used in the direct IO path, and this will allow for some consolidation with the regular fsync path, which will help with the upcoming nocow mode. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:45 -04:00
Kent Overstreet	d4bce63636	bcachefs: Kill BCH_WRITE_JOURNAL_SEQ_PTR Dead code, delete. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:45 -04:00
Kent Overstreet	3e3e02e6bc	bcachefs: Assorted checkpatch fixes checkpatch.pl gives lots of warnings that we don't want - suggested ignore list: ASSIGN_IN_IF UNSPECIFIED_INT - bcachefs coding style prefers single token type names NEW_TYPEDEFS - typedefs are occasionally good FUNCTION_ARGUMENTS - we prefer to look at functions in .c files (hopefully with docbook documentation), not .h file prototypes MULTISTATEMENT_MACRO_USE_DO_WHILE - we have _many_ x-macros and other macros where we can't do this Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:44 -04:00
Kent Overstreet	1be887979b	bcachefs: Handle dropping pointers in data_update path Cached pointers are generally dropped, not moved: this led to an assertion firing in the data update path when there were no new replicas being written. This path adds a data_options field for pointers to be dropped, and tweaks move_extent() to check if we're only dropping pointers, not writing new ones, before kicking off a data update operation. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:42 -04:00
Kent Overstreet	674cfc2624	bcachefs: Add persistent counters for all tracepoints Also, do some reorganizing/renaming, convert atomic counters in bch_fs to persistent counters, and add a few missing counters. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:39 -04:00
Kent Overstreet	12043cf151	bcachefs: fsck: Another transaction restart handling fix Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:39 -04:00
Kent Overstreet	549d173c1b	bcachefs: EINTR -> BCH_ERR_transaction_restart Now that we have error codes, with subtypes, we can switch to our own error code for transaction restarts - and even better, a distinct error code for each transaction restart reason: clearer code and better debugging. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:37 -04:00
Kent Overstreet	597dee1cd6	bcachefs: Switch data_update path to snapshot_id_list snapshots_seen is becoming private to fsck, and snapshot_id_list is actually what the data update path needs. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:35 -04:00
Kent Overstreet	7f5c5d20f0	bcachefs: Redo data_update interface This patch significantly cleans up and simplifies the data_update interface. Instead of only being able to specify a single pointer by device to rewrite, we're now able to specify any or all of the pointers in the original extent to be rewrited, as a bitmask. data_cmd is no more: the various pred functions now just return true if the extent should be moved/updated. All the data_update path does is rewrite existing replicas, or add new ones. This fixes a bug where with background compression on replicated filesystems, where rebalance -> data_update would incorrectly drop the wrong old replica, and keep trying to recompress an extent pointer and each time failing to drop the right replica. Oops. Now, the data update path doesn't look at the io options to decide which pointers to keep and which to drop - it only goes off of the data_update_options passed to it. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:34 -04:00
Kent Overstreet	c501fef6de	bcachefs: Pull out data_update.c This is the start of reorganizing the data IO paths. The plan is to also break apart io.c into data_read.c and data_write.c, and migrate_write will be renamed to the data_update path. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:34 -04:00

30 Commits