linux

iv/linux

Author	SHA1	Message	Date
Kent Overstreet	280249b9d9	bcachefs: Correctly order flushes and journal writes on multi device filesystems All writes prior to a journal write need to be flushed before the journal write itself happens. On single device filesystems, it suffices to mark the write with REQ_PREFLUSH\|REQ_FUA, but on multi device filesystems we need to issue flushes to every device - and wait for them to complete - before issuing the journal writes. Previously, we were issuing flushes to every device, but we weren't waiting for them to complete before issuing the journal writes. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:51 -04:00
Kent Overstreet	ed9d58a2b1	bcachefs: Run jset_validate in write path as well This is because we had a bug where we were writing out journal entries with garbage last_seq, and not catching it. Also, completely ignore jset->last_seq when JSET_NO_FLUSH is true, because of aforementioned bug, but change the write path to set last_seq to 0 when JSET_NO_FLUSH is true. Minor other cleanups and comments. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:51 -04:00
Kent Overstreet	07a1006ae8	bcachefs: Reduce/kill BKEY_PADDED use With various newer key types - stripe keys, inline data extents - the old approach of calculating the maximum size of the value is becoming more and more error prone. Better to switch to bkey_on_stack, which can dynamically allocate if necessary to handle any size bkey. In particular we also want to get rid of BKEY_EXTENT_VAL_U64s_MAX. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:50 -04:00
Kent Overstreet	3187aa8d57	bcachefs: Don't use BTREE_INSERT_USE_RESERVE so much Previously, we were using BTREE_INSERT_RESERVE in a lot of places where it no longer makes sense. - we now have more open_buckets than we used to, and the reserves work better, so we shouldn't need to use BTREE_INSERT_RESERVE just because we're holding open_buckets pinned anymore. - We have the btree key cache for updates to the alloc btree, meaning we no longer need the btree reserve to ensure the allocator can make forward progress. This means that we should only need a reserve for btree updates to ensure that copygc can make forward progress. Since it's now just for copygc, we can also fold RESERVE_BTREE into RESERVE_MOVINGGC (the allocator's freelist reserve). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:50 -04:00
Kent Overstreet	5d32c5bb07	bcachefs: Be more conservation about journal pre-reservations - Try to always keep 1/8th of the journal free, on top of pre-reservations - Move the check for whether the journal is stuck to bch2_journal_space_available, and make it only fire when there aren't any journal writes in flight (that might free up space by updating last_seq) Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:49 -04:00
Kent Overstreet	adbcada43f	bcachefs: Don't require flush/fua on every journal write This patch adds a flag to journal entries which, if set, indicates that they weren't done as flush/fua writes. - non flush/fua journal writes don't update last_seq (i.e. they don't free up space in the journal), thus the journal free space calculations now check whether nonflush journal writes are currently allowed (i.e. are we low on free space, or would doing a flush write free up a lot of space in the journal) - write_delay_ms, the user configurable option for when open journal entries are automatically written, is now interpreted as the max delay between flush journal writes (default 1 second). - bch2_journal_flush_seq_async is changed to ensure a flush write >= the requested sequence number has happened - journal read/replay must now ignore, and blacklist, any journal entries newer than the most recent flush entry in the journal. Also, the way the read_entire_journal option is handled has been improved; struct journal_replay now has an entry, 'ignore', for entries that were read but should not be used. - assorted refactoring and improvements related to journal read in journal_io.c and recovery.c Previously, we'd have to issue a flush/fua write every time we accumulated a full journal entry - typically the bucket size. Now we need to issue them much less frequently: when an fsync is requested, or it's been more than write_delay_ms since the last flush, or when we need to free up space in the journal. This is a significant performance improvement on many write heavy workloads. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:49 -04:00
Kent Overstreet	b6df4325cd	bcachefs: Improve journal free space calculations Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:49 -04:00
Kent Overstreet	ebb84d0941	bcachefs: Increase journal pipelining This patch increases the maximum journal buffers in flight from 2 to 4 - this will be particularly helpful when in the future we stop requiring flush+fua for every journal write. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:49 -04:00
Kent Overstreet	c5bb169034	bcachefs: Fix journal_flush_seq() The error check was inverted - leading fsyncs to get stuck and hang, oops. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:49 -04:00
Kent Overstreet	33b3b1dc0f	bcachefs: Optimize bch2_journal_flush_seq_async() Avoid taking the journal lock if we don't have to. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:48 -04:00
Kent Overstreet	f302055077	bcachefs: Fix an rcu splat bch2_bucket_alloc() requires rcu_read_lock() to be held. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:48 -04:00
Kent Overstreet	b7a9bbfc1b	bcachefs: Move journal reclaim to a kthread This is to make tracing easier. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:48 -04:00
Kent Overstreet	ed0e24c099	bcachefs: Be more precise with journal error reporting We were incorrectly detecting a journal deadlock - the journal filling up - when only the journal pin fifo had filled up; if the journal pin fifo is full that just means we need to wait on reclaim. This plumbs through better error reporting so we can better discriminate in the journal_res_get path what's going on. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:47 -04:00
Kent Overstreet	e8c851b351	bcachefs: Add an ioctl for resizing journal on a device Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:47 -04:00
Kent Overstreet	e8bd002b23	bcachefs: Dump journal state when the journal deadlocks Currently tracking down one of these bugs. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:47 -04:00
Kent Overstreet	158eecb88e	bcachefs: Assorted journal refactoring Improved the way we track various state by adding j->err_seq, which records the first journal sequence number that encountered an error being written, and j->last_empty_seq, which records the most recent journal entry that was completely empty. Also, use the low bits of the journal sequence number to index the corresponding journal_buf. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:47 -04:00
Kent Overstreet	1676a398d3	bcachefs: Delete dead journalling code Usage of the journal has gotten somewhat simpler over time - neat. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:47 -04:00
Kent Overstreet	8be901d5d4	bcachefs: Always write a journal entry when stopping journal This is to fix a (harmless) bug where the read clock hand in the superblock doesn't match the journal. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:45 -04:00
Kent Overstreet	61ce38b862	bcachefs: Fix journal_seq_copy() We also need to update the journal's bloom filter of inode numbers that each journal write has upudates for - in case the inode gets evicted before it gets fsynced. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:44 -04:00
Kent Overstreet	7807e14384	bcachefs: Convert various code to printbuf printbufs know how big the buffer is that was allocated, so we can get rid of the random PAGE_SIZEs all over the place. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:43 -04:00
Kent Overstreet	89fd25be70	bcachefs: Use x-macros for data types Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:42 -04:00
Kent Overstreet	7fffc85baf	bcachefs: Add an internal option for reading entire journal To be used the debug tool that dumps the contents of the journal. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:41 -04:00
Kent Overstreet	4efe71a646	bcachefs: Always give out journal pre-res if we already have one This is better than skipping the journal pre-reservation if we already have one - we should still acount for the journal reservation we're going to have to get. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:41 -04:00
Kent Overstreet	495aabede3	bcachefs: Add debug code to print btree transactions Intented to help debug deadlocks, since we can't use lockdep to check btree node lock ordering. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:40 -04:00
Kent Overstreet	00b8ccf707	bcachefs: Interior btree updates are now fully transactional We now update the alloc info (bucket sector counts) atomically with journalling the update to the interior btree nodes, and we also set new btree roots atomically with the journalled part of the btree update. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:40 -04:00
Kent Overstreet	b72633aed0	bcachefs: Switch a BUG_ON() to a warning This has popped and thus needs to be debugged, but the assertion firing isn't necessarily fatal so switch it to a warning. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:37 -04:00
Kent Overstreet	aef90ce085	bcachefs: kill bch2_extent_has_device() Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:32 -04:00
Kent Overstreet	1f7d45beb7	bcachefs: Fix journal shutdown path Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:21 -04:00
Kent Overstreet	644d180b05	bcachefs: Journal replay refactoring Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:20 -04:00
Kent Overstreet	478259b749	bcachefs: delete duplicated code Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:20 -04:00
Kent Overstreet	1dd7f9d98d	bcachefs: Rewrite journal_seq_blacklist machinery Now, we store blacklisted journal sequence numbers in the superblock, not the journal: this helps to greatly simplify the code, and more importantly it's now implemented in a way that doesn't require all btree nodes to be visited before starting the journal - instead, we unconditionally blacklist the next 4 journal sequence numbers after an unclean shutdown. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:20 -04:00
Kent Overstreet	3a0e06db71	bcachefs: Assorted preemption fixes Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:19 -04:00
Kent Overstreet	134915f3d3	bcachefs: Go rw lazily Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:18 -04:00
Kent Overstreet	db6447b383	bcachefs: fix a faulty assertion Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:17 -04:00
Kent Overstreet	68ef94a63c	bcachefs: Add a pre-reserve mechanism for the journal Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:17 -04:00
Kent Overstreet	9ace606e93	bcachefs: Don't block on reclaim_lock from journal_res_get When we're doing btree updates from journal flush, this becomes a locking inversion Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:17 -04:00
Kent Overstreet	03d5eaed86	bcachefs: bch2_journal_space_available improvements Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:17 -04:00
Kent Overstreet	2384db8f32	bcachefs: Separate discards from rest of journal reclaim Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:17 -04:00
Kent Overstreet	0ce2dbbe99	bcachefs: ja->discard_idx, ja->dirty_idx Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:17 -04:00
Kent Overstreet	6409c6a0ae	bcachefs: use correct wq for journal reclaim Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:16 -04:00
Kent Overstreet	e5a66496a0	bcachefs: Journal reclaim refactoring Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:16 -04:00
Kent Overstreet	2d3b581039	bcachefs: Better journal debug Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:16 -04:00
Kent Overstreet	c8cc5b3e3f	bcachefs: Don't get journal reservation until after we know insert will succeed Checking if we can do the insert after getting the journal reservation means potentially wasting space in the journal, which will break the new pre reservation mechanism Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:16 -04:00
Kent Overstreet	8db2acde2f	bcachefs: fix integer underflow in journal code Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:16 -04:00
Kent Overstreet	d16b4a77a5	bcachefs: Assorted journal refactoring Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:16 -04:00
Kent Overstreet	768ac63924	bcachefs: Add a mechanism for blocking the journal Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:16 -04:00
Kent Overstreet	eac3ca0f49	bcachefs: New journal_entry_res mechanism Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:15 -04:00
Kent Overstreet	9166b41db1	bcachefs: s/usage_lock/mark_lock better describes what it's for, and we're going to call a new lock usage_lock Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:13 -04:00
Kent Overstreet	9d11058a78	bcachefs: fix waiting on an open journal entry Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:12 -04:00
Kent Overstreet	9ca53b55f7	bcachefs: gc now operates on second set of bucket marks This means we can now use gc to verify the allocation information - important for testing persistant alloc info Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:12 -04:00

1 2

60 Commits