linux

iv/linux

Author	SHA1	Message	Date
Kent Overstreet	40022c0115	bcachefs: filemap_get_contig_folios_d() Add a new helper for getting a range of contiguous folios and returning them in a darray. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:59 -04:00
Kent Overstreet	a1774a0564	bcachefs: bch_folio_sector_state improvements - X-macro-ize the bch_folio_sector_state enum: this means we can easily generate strings, which is helpful for debugging. - Add helpers for state transitions: folio_sector_dirty(), folio_sector_undirty(), folio_sector_reserve() - Add folio_sector_set(), a single helper for changing folio sector state just so that we have a single place to instrument when we're debugging. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:59 -04:00
Kent Overstreet	959f7368d6	bcachefs: bch2_truncate_page() large folio conversion Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:59 -04:00
Kent Overstreet	c42b57c451	bcachefs: bch2_buffered_write large folio conversion Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:59 -04:00
Kent Overstreet	49fe78ff33	bcachefs: bch_folio can now handle multi-order folios Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:59 -04:00
Kent Overstreet	33e2eb9677	bcachefs: More assorted large folio conversion Various misc small conversions in fs-io.c for large folios. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:59 -04:00
Kent Overstreet	a86a92cb0d	bcachefs: bch2_seek_pagecache_data() folio conversion This converts bch2_seek_pagecache_data() to handle large folios. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:59 -04:00
Kent Overstreet	e8d28c3e47	bcachefs: bch2_seek_pagecache_hole() folio conversion This converts bch2_seek_pagecache_hole() to handle large folios. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:59 -04:00
Kent Overstreet	ff9c301f28	bcachefs: bio_for_each_segment_all() -> bio_for_each_folio_all() This converts the writepage end_io path to folios. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:59 -04:00
Kent Overstreet	30bff5944e	bcachefs: Initial folio conversion This converts fs-io.c to pass folios, not pages. We're not handling large folios yet, there's no functional changes in this patch - just a lot of churn doing the initial type conversions. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:59 -04:00
Kent Overstreet	3342ac134d	bcachefs: Rename bch_page_state -> bch_folio Start of the large folio conversion. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:59 -04:00
Kent Overstreet	c437e15379	bcachefs: Add a bch_page_state assert Seeing an odd bug with page/folio state not being properly initialized, this is to help track it down. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:59 -04:00
Kent Overstreet	2776369266	bcachefs: Add a cond_resched() call to journal_keys_sort() We're just doing cpu work here and it could take awhile, a cond_resched() is definitely needed. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:59 -04:00
Kent Overstreet	bb6c4b92fd	bcachefs: Improve trace_move_extent_fail() This greatly expands the move_extent_fail tracepoint - now it includes all the information we have available, including exactly why the extent wasn't updated. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:59 -04:00
Kent Overstreet	8669199438	bcachefs: Print out counters correctly Most counters aren't in units of sectors, and the ones that are should just be switched to bytes, for simplicity. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:59 -04:00
Kent Overstreet	dde72e1827	bcachefs: Add missing bch2_err_class() call We're not supposed to return our private error codes to userspace. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:59 -04:00
Kent Overstreet	62a03559d6	bcachefs: Rip out code for storing backpointers in alloc keys We don't store backpointers in alloc keys anymore, since we gained the btree write buffer. This patch drops support for backpointers in alloc keys, and revs the on disk format version so that we know a fsck is required. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:59 -04:00
Brian Foster	349b1d832b	bcachefs: use reservation for log messages during recovery If we block on journal reservation attempting to log journal messages during recovery, particularly for the first message(s) before we start doing actual work, chances are the filesystem ends up deadlocked. Allow logged messages to use reserved journal space to mitigate this problem. In the worst case where no space is available whatsoever, this at least allows the fs to recognize that the journal is stuck and fail the mount gracefully. Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:59 -04:00
Kent Overstreet	3d86f13df6	bcachefs: Improve trans_restart_split_race tracepoint Seeing occasional test failures where we get stuck in a livelock that involves this event - this will help track it down. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:58 -04:00
Kent Overstreet	25d8f40560	bcachefs: Data update path no longer leaves cached replicas It turns out that it's currently impossible to invalidate buckets containing only cached data if they're part of a stripe. The normal bucket invalidate path can't do it because we have to be able to incerement the bucket's gen, which isn't correct becasue it's still a member of the stripe - and the bucket invalidate path makes the bucket availabel for reuse right away, which also isn't correct for buckets in stripes. What would work is invalidating cached data by following backpointers, except that cached replicas don't currently get backpointers - because they would be awkward for the existing bucket invalidate path to delete and they haven't been needed elsewhere. So for the time being, to prevent running out of space in stripes, switch the data update path to not leave cached replicas; we may revisit this in the future. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:58 -04:00
Kent Overstreet	32de2ea0d5	bcachefs: Rhashtable based buckets_in_flight for copygc Previously, copygc used a fifo for tracking buckets in flight - this had the disadvantage of being fixed size, since we pass references to elements into the move code. This restructures it to be a hash table and linked list, since with erasure coding we need to be able to pipeline across an arbitrary number of buckets. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:58 -04:00
Kent Overstreet	6bdefe9c39	bcachefs: Use BTREE_ITER_INTENT in ec_stripe_update_extent() This adds a flags param to bch2_backpointer_get_key() so that we can pass BTREE_ITER_INTENT, since ec_stripe_update_extent() is updating the extent immediately. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:58 -04:00
Kent Overstreet	4f77dcde28	bcachefs: move snapshot_t to subvolume_types.h this doesn't need to be in bcachefs.h Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:58 -04:00
Kent Overstreet	1546cf9727	bcachefs: Fix bch2_get_key_or_hole() This fixes an off by one error, due to confusing closed vs. half open intervals. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:58 -04:00
Kent Overstreet	2a6c302fb3	bcachefs: Check return code from need_whiteout_for_snapshot() This could return a transaction restart; we need to check for that. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:58 -04:00
Kent Overstreet	e9b9e475ea	bcachefs: bch2_dev_freespace_init() Print out status every 10 seconds It appears freespace init can still take awhile, and we've had a report or two of it getting stuck - let's have it print out where it's at every 10 seconds. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:58 -04:00
Kent Overstreet	b1c945b3fd	bcachefs: Run freespace init in device hot add path Like in the recovery, and device add, we have to check if devices don't have the freespace btree initialized - this was missed in the device hot add path. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:58 -04:00
Kent Overstreet	0fb11e0801	bcachefs: Improved copygc wait debugging This just adds a line for how long copygc has been waiting to sysfs copygc_wait, helpful for debugging why copygc isn't running. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:58 -04:00
Kent Overstreet	11f117374a	bcachefs: Call bch2_path_put_nokeep() before bch2_path_put() bch2_path_put_nokeep() is sketchy, and we should consider removing it: it unconditionally frees btree_paths once their ref hits 0. The assumption is that we only use it for paths that have never been visible outside the btree core btree code; i.e. higher level code will never be making assumptions about locking based on these paths. However, there's subtle brokenness with this approach: - If we call bch2_path_put(), then bch2_path_put_nokeep(), bch2_path_put() may free the first path on the assumption that we we have another path keeping a node locked - but then bch2_path_put_nokeep() just unconditionally frees it. The same bug may arise if we're calling bch2_path_put() and bch2_path_put_nokeep() on the same (refcounted) path, or two adjacent paths that point to the same btree node. This patch hacks around one of these bugs by calling bch2_path_put_nokeep() first in bch2_trans_iter_exit. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:58 -04:00
Brian Foster	030e9f9264	bcachefs: drop unnecessary journal stuck check from space calculation The journal stucking check in bch2_journal_space_available() is particularly aggressive and can lead to premature shutdown in some rare cases. This is difficult to reproduce, but also comes along with a fatal error and so is worthwhile to be cautious. For example, we've seen instances where the journal is under heavy reservation pressure, the journal allocation path transitions into the final available journal bucket, the journal write path immediately consumes that bucket and calls into bch2_journal_space_available(), which then in turn flags the journal as stuck because there is no available space and shuts down the filesystem instead of submitting the journal write (that would have otherwise succeeded). To avoid this problem, simplify the journal stuck checking by just relying on the higher level logic in the journal reservation path. This produces more useful debug output and is a more reliable indicator that things have bogged down. Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:58 -04:00
Brian Foster	db1bf72905	bcachefs: refactor journal stuck checking into standalone helper bcachefs checks for journal stuck conditions both in the journal space calculation code and the journal reservation slow path. The logic in both places is rather tricky and can result in non-deterministic failure characteristics and debug output. In preparation to condense journal stuck handling to a single place, refactor the __journal_res_get() logic into a standalone helper. Since multiple callers into the reservation code can result in duplicate reports, use the ->err_seq field as a serialization mechanism for the debug dump. Finally, add some comments to help explain the logic and hopefully facilitate further improvements in the future. Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:58 -04:00
Brian Foster	23fd4f4dc6	bcachefs: gracefully unwind journal res slowpath on shutdown bcachefs detects journal stuck conditions in a couple different places. If the logic in the journal reservation slow path happens to detect the problem, I've seen instances where the filesystem remains deadlocked even though it has been shut down. This is occasionally reproduced by generic/333, and usually manifests as one or more tasks stuck in the journal reservation slow path. To help avoid this problem, repeat the journal error check in __journal_res_get() once under spinlock to cover the case where the previous lock holder might have triggered shutdown. This also helps avoid spurious/duplicate stuck reports. Also, wake the journal from the halt code to make sure blocked callers of the journal res slowpath have a chance to wake up and observe the pending error. This survives an overnight looping run of generic/333 without the aforementioned lockups. Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:58 -04:00
Brian Foster	873555f04d	bcachefs: more aggressive fast path write buffer key flushing The btree write buffer flush code is prone to causing journal deadlock due to inefficient use and release of reservation space. Reservation is not pre-reserved for write buffered keys (as is done for key cache keys, for example), because the write buffer flush side uses a fast path that attempts insertion without need for any reservation at all. The write buffer flush attempts to deal with this by inserting keys using the BTREE_INSERT_JOURNAL_RECLAIM flag to return an error on journal reservations that require blocking. Upon first error, it falls back to a slow path that inserts in journal order and supports moving the associated journal pin forward. The problem is that under pathological conditions (i.e. smaller log, larger write buffer and journal reservation pressure), we've seen instances where the fast path fails fairly quickly without having completed many insertions, and then the slow path is unable to push the journal pin forward enough to free up the space it needs to completely flush the buffer. This problem is occasionally reproduced by fstest generic/333. To avoid this problem, update the fast path algorithm to skip key inserts that fail due to inability to acquire needed journal reservation without immediately breaking out of the loop. Instead, insert as many keys as possible, zap the sequence numbers to mark them as processed, and then fall back to the slow path to process the remaining set in journal order. This reduces the amount of journal reservation that might be required to flush the entire buffer and increases the odds that the slow path is able to move the journal pin forward and free up space as keys are processed. Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:58 -04:00
Brian Foster	8bff9875a6	bcachefs: use dedicated workqueue for tasks holding write refs A workqueue resource deadlock has been observed when running fsck on a filesystem with a full/stuck journal. fsck is not currently able to repair the fs due to fairly rapid emergency shutdown, but rather than exit gracefully the fsck process hangs during the shutdown sequence. Fortunately this is easily recoverable from userspace, but the root cause involves code shared between the kernel and userspace and so should be addressed. The deadlock scenario involves the main task in the bch2_fs_stop() -> bch2_fs_read_only() path waiting on write references to drain with the fs state lock held. A bch2_read_only_work() workqueue task is scheduled on the system_long_wq, blocked on the state lock. Finally, various other write ref holding workqueue tasks are scheduled to run on the same workqueue and must complete in order to release references that the initial task is waiting on. To avoid this problem, we can split the dependent workqueue tasks across different workqueues. It's a bit of a waste to create a dedicated wq for the read-only worker, but there are several tasks throughout the fs that follow the pattern of acquiring a write reference and then scheduling to the system wq. Use a local wq for such tasks to break the subtle dependency between these and the read-only worker. Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:58 -04:00
Brian Foster	76c70c57f0	bcachefs: remove unused bch2_trans_log_msg() Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:58 -04:00
Kent Overstreet	ffc76edbbe	bcachefs: Fix bch2_verify_bucket_evacuated() We were going into an infinite loop when printing out backpointers, due to never incrementing bp_offset - whoops. Also limit the number of backpointers we print to 10; this is debug code and we only need to print a sample, not all of them. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:58 -04:00
Kent Overstreet	d59ca7e8c0	bcachefs: verify_bucket_evacuated() -> set_btree_iter_dontneed() This should help with excessive 'would deadlock' transaction restarts. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:58 -04:00
Kent Overstreet	330970c2c6	bcachefs: Make reconstruct_alloc quieter We shouldn't be printing out fsck errors for expected errors - this helps make test logs more readable, and makes it easier to see what the actual failure was. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:58 -04:00
Kent Overstreet	3e36e572f1	bcachefs: Fix an unhandled transaction restart error This is a bit awkward: we're passing around a btree_trans, but we're not in a context where transaction restarts are handled - we should try to come up with a better way to denote situations like this. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:58 -04:00
Kent Overstreet	dc6274bcb8	bcachefs: Fix nocow write path closure bug With regular waitlists, we need to ensure we always call finish_wait(). With closures, the equivalent is that we need to call closure_sync() before returning with a stack-allocated closure. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:58 -04:00
Kent Overstreet	ac77810cb4	bcachefs: Nocow write error path fix The nocow write error path was iterating over pointers in an extent, aftre we'd dropped btree locks - oops. Fortunately we'd already stashed what we need in nocow_lock_bucket, so use that instead. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:58 -04:00
Kent Overstreet	abab7609de	bcachefs: Fix bch2_extent_fallocate() in nocow mode When we allocate disk space, we need to be incrementing the WRITE io clock, which perhaps should be renamed to sectors allocated - copygc uses this io clock to know when to run. Also, we should be incrementing the same clock when allocating btree nodes. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:57 -04:00
Kent Overstreet	711bf946d5	bcachefs: Add an assert in inode_write for -ENOENT Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:57 -04:00
Kent Overstreet	9edbcc72f6	bcachefs: Fix bch2_evict_subvolume_inodes() This fixes a bug in bch2_evict_subvolume_inodes(): d_mark_dontcache() doesn't handle the case where i_count is already 0, we need to grab and put the inode in order for it to be dropped. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:57 -04:00
Kent Overstreet	e1e7ecafe6	bcachefs: Improve error handling in bch2_ioctl_subvolume_destroy() Pure style fixes Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:57 -04:00
Kent Overstreet	2d33036ca9	bcachefs: Fix for 'missing subvolume' error Subvolumes, including their root inodes, get deleted asynchronously after an unlink. But we still need to ensure that we tell the VFS the inode has been deleted, otherwise VFS writeback could fire after asynchronous deletion has finished, and try to write to an inode/subvolume that no longer exists. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:57 -04:00
Kent Overstreet	56cc033dfc	bcachefs: Don't run transaction hooks multiple times transaction hooks aren't supposed to run unless we know the transaction is going to commit succesfully: this fixes a bug with attempting to delete a subvolume multiple times. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:57 -04:00
Kent Overstreet	26559553e4	bcachefs: Add a fallback when journal_keys doesn't fit in ram We may end up in a situation where allocating the buffer for the sorted journal_keys fails - but it would likely succeed, post compaction where we drop duplicates. We've had reports of this allocation failing, so this adds a slowpath to do the compaction incrementally. This is only a band-aid fix; we need to look at limiting the number of keys in the journal based on the amount of system RAM. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:57 -04:00
Kent Overstreet	2f0815840c	bcachefs: Improve the backpointer to missing extent message We now print the pos where the backpointer was found in the btree, as well as the exact bucket:bucket_offset of the data, to aid in grepping through logs. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:57 -04:00
Kent Overstreet	40a18fe273	bcachefs: Add error message for failing to allocate sorted journal keys Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:57 -04:00

1 2 3 4 5 ...

1217534 Commits