linux

iv/linux

Author	SHA1	Message	Date
Kent Overstreet	f0a3a2ccab	bcachefs: Journal seq now incremented at entry open, not close This patch changes journal_entry_open() to initialize the new journal entry, not __journal_entry_close(). This also means that journal_cur_seq() refers to the sequence number of the last journal entry when we don't have an open journal entry, not the next one. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:26 -04:00
Kent Overstreet	dfc0f7ea00	bcachefs: bch2_journal_halt() now takes journal lock This change is prep work for moving some work from __journal_entry_close() to journal_entry_open(): without this change, journal_entry_open() doesn't know if it's going to be able to open a new journal entry until the cmpxchg loop, meaning it can't create the new journal pin entry and update other global state because those have to be done prior to the cmpxchg opening the new journal entry. Fortunately, we don't call bch2_journal_halt() from interrupt context. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:26 -04:00
Kent Overstreet	fbec3b8800	bcachefs: Kill JOURNAL_NEED_WRITE This replaces the journal flag JOURNAL_NEED_WRITE with per-journal buf state - more explicit, and solving a race in the old code that would lead to entries being opened and written unnecessarily. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:26 -04:00
Kent Overstreet	506bac7e59	bcachefs: Delete some dead journal code Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:26 -04:00
Kent Overstreet	75ef2c59bc	bcachefs: Start moving debug info from sysfs to debugfs In sysfs, files can only output at most PAGE_SIZE. This is a problem for debug info that needs to list an arbitrary number of times, and because of this limit some of our debug info has been terser and harder to read than we'd like. This patch moves info about journal pins and cached btree nodes to debugfs, and greatly expands and improves the output we return. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:26 -04:00
Kent Overstreet	fa8e94faee	bcachefs: Heap allocate printbufs This patch changes printbufs dynamically allocate and reallocate a buffer as needed. Stack usage has become a bit of a problem, and a major cause of that has been static size string buffers on the stack. The most involved part of this refactoring is that printbufs must now be exited with printbuf_exit(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:25 -04:00
Kent Overstreet	eac91bf27f	bcachefs: Fix bch2_journal_pins_to_text() When key cache pins were put onto their own list, we neglected to update bch2_journal_pins_to_text() to print them. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:25 -04:00
Kent Overstreet	e201f70b11	bcachefs: Fix for journal getting stuck The journal can get stuck if we need to get a journal reservation for something we have a pre-reservation for, but aren't able to reclaim space, or if the pin fifo is full - it's impractical to resize the pin fifo at runtime. Previously, we reserved 8 entries in the pin fifo for pre-reservations, but that seems small - we're seeing the journal occasionally get stuck. Let's reserve a quarter of it. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:25 -04:00
Kent Overstreet	5b2e599f50	bcachefs: bch2_journal_noflush_seq() Add bch2_journal_noflush_seq(), for telling the journal that entries before a given sequence number should not be flushes - to be used by an upcoming allocator optimization. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:20 -04:00
Kent Overstreet	abe19d458e	bcachefs: Refactor open_bucket code Prep work for adding a hash table of open buckets - instead of embedding a bch_extent_ptr, we need to refer to the bucket directly so that we're not calling sector_to_bucket() in the hash table lookup code, which has an expensive divide. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:20 -04:00
Kent Overstreet	8511632d44	bcachefs: Journal initialization fixes This fixes a rare bug when mounting & unmounting RO - flushing a clean filesystem that never went RO should be a no op. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:19 -04:00
Kent Overstreet	4141fde0be	bcachefs: Fix bch2_journal_meta() This patch ensures that the journal entry written gets written as flush entry, which is important for the shutdown path - the last entry written needs to be a flush entry. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:19 -04:00
Kent Overstreet	77170d0dd7	bcachefs: bch2_bucket_alloc_new_fs() no longer depends on bucket marks Now that bch2_bucket_alloc_new_fs() isn't looking at bucket marks to decide what buckets are eligible to allocate, we can clean up the filesystem initialization and device add paths. Previously, we had to use ancient code to mark superblock/journal buckets in the in memory bucket marks as we allocated them, and then zero that out and re-do that marking using the newer transational bucket mark paths. Now, we can simply delete the in-memory bucket marking. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:19 -04:00
Kent Overstreet	991ba02112	bcachefs: Add more time_stats This adds more latency/event measurements and breaks some apart into more events. Journal writes are broken apart into flush writes and noflush writes, btree compactions are broken out from btree splits, btree mergers are added, as well as btree_interior_updates - foreground and total. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:18 -04:00
Kent Overstreet	1d81313f22	bcachefs: Make __bch2_journal_debug_to_text() more readable Switch to one line of output per pr_buf() call - longer lines but quite a bit more readable. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:18 -04:00
Kent Overstreet	2430e72f42	bcachefs: Convert journal sysfs params to regular options This converts journal_write_delay, journal_flush_disabled, and journal_reclaim_delay to normal filesystems options, and also adds them to the superblock. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:18 -04:00
Kent Overstreet	9be1efe9c5	bcachefs: Fix error reporting from bch2_journal_flush_seq - bch2_journal_halt() was unconditionally overwriting j->err_seq, the sequence number that we failed to write - journal_write_done was updating seq_ondisk and flushed_seq_ondisk even for writes that errored, which broke the way bch2_journal_flush_seq_async() locklessly checked for completions. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:17 -04:00
Kent Overstreet	531b69e9af	bcachefs: Convert journal BUG_ON() to a warning It's definitely indicative of a bug if we request to flush a journal sequence number that hasn't happened yet, but it's more useful if we warn and print out the relevant sequence numbers instead of just dying. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:17 -04:00
Kent Overstreet	0e030f5e20	bcachefs: Kill journal buf bloom filter This was used for recording which inodes have been modified by in flight journal writes, but was broken and has been superceded. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:16 -04:00
Kent Overstreet	731bdd2eff	bcachefs: Add a workqueue for btree io completions Also, clean up workqueue usage - we shouldn't be using system workqueues, pretty much everything we do needs to be on our own WQ_MEM_RECLAIM workqueues. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:04 -04:00
Kent Overstreet	6ebe32b94c	bcachefs: Fix locking in __bch2_set_nr_journal_buckets() We weren't holding mark_lock correctly - it's needed for the new_fs path. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:03 -04:00
Kent Overstreet	1784d43a88	bcachefs: Fix usage of last_seq + encryption jset->last_seq is in the region that's encrypted - on journal write completion, we were using it and getting garbage. This patch shadows it to fix. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:02 -04:00
Kent Overstreet	2ce867df31	bcachefs: Make sure to initialize j->last_flushed If the journal reclaim thread makes it to the timeout without ever initializing j->last_flushed, we could end up sleeping for a very long time. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:02 -04:00
Kent Overstreet	633632ef1b	bcachefs: Simplify bch2_set_nr_journal_buckets() Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:00 -04:00
Kent Overstreet	d62ab355d7	bcachefs: Fix bch2_trans_mark_dev_sb() Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:00 -04:00
Kent Overstreet	241e26369e	bcachefs: Don't flush btree writes more aggressively because of btree key cache We need to flush the btree key cache when it's too dirty, because otherwise the shrinker won't be able to reclaim memory - this is done by journal reclaim. But journal reclaim also kicks btree node writes: this meant that btree node writes were getting kicked much too often just because we needed to flush btree key cache keys. This patch splits journal pins into two different lists, and teaches journal reclaim to not flush btree node writes when it only needs to flush key cache keys. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:59 -04:00
Kent Overstreet	671cc8a51b	bcachefs: Eliminate memory barrier from fast path of journal_preres_put() Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:59 -04:00
Kent Overstreet	2940295c97	bcachefs: Be more careful about JOURNAL_RES_GET_RESERVED JOURNAL_RES_GET_RESERVED should only be used for updatse that need to be done to free up space in the journal. In particular, when we're flushing keys from the key cache, if we're flushing them out of order we shouldn't be using it, since we're using up our remaining space in the journal without dropping a pin that will let us make forward progress. With this patch, BTREE_INSERT_JOURNAL_RECLAIM without BTREE_INSERT_JOURNAL_RESERVED may return -EAGAIN - we can't wait on journal reclaim if we're already in journal reclaim. This means we need to propagate these errors up to journal reclaim, indicating that flushing a journal pin should be retried in the future. This is prep work for a patch to change the way journal reclaim works, to split out flushing key cache keys because the btree key cache is too dirty from journal reclaim because we need space in the journal. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:59 -04:00
Kent Overstreet	24db24c749	bcachefs: Don't make foreground writes wait behind journal reclaim too long Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:58 -04:00
Kent Overstreet	7c8b166e58	bcachefs: Increase default journal size The default was 1/256th of the device and capped at 512MB, which is fairly tiny these days. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:57 -04:00
Kent Overstreet	ba401eaac3	bcachefs: Don't list non journal devs in journal_debug_to_text() Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:56 -04:00
Kent Overstreet	9ae28f824e	bcachefs: Start journal reclaim thread earlier Especially in userspace, we sometime run into resource exhaustion issues with starting up threads after mark and sweep/fsck. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:54 -04:00
Kent Overstreet	5ea037d03c	bcachefs: Assert that we're not trying to flush journal seq in the future Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:53 -04:00
Kent Overstreet	6a16ad951a	bcachefs: Add flushed_seq_ondisk to journal_debug_to_text() Also, make the wait in bch2_journal_flush_seq() interruptible, not just killable. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:53 -04:00
Kent Overstreet	4b8f89afd4	bcachefs: Fixes/improvements for journal entry reservations This fixes some arithmetic bugs in "bcachefs: Journal updates to dev usage" - additionally, it cleans things up by switching everything that goes in every journal entry to the journal_entry_res mechanism. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:52 -04:00
Kent Overstreet	2abe542087	bcachefs: Persist 64 bit io clocks Originally, bcachefs - going back to bcache - stored, for each bucket, a 16 bit counter corresponding to how long it had been since the bucket was read from. But, this required periodically rescaling counters on every bucket to avoid wraparound. That wasn't an issue in bcache, where we'd perodically rewrite the per bucket metadata all at once, but in bcachefs we're trying to avoid having to walk every single bucket. This patch switches to persisting 64 bit io clocks, corresponding to the 64 bit bucket timestaps introduced in the previous patch with KEY_TYPE_alloc_v2. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:52 -04:00
Kent Overstreet	e4c3f386b6	bcachefs: Improve diagnostics when journal entries are missing There's an outstanding bug with journal entries being missing in journal replay. This patch adds code to print out where the journal entries were physically located that were around the entry(ies) being missing, which should make debugging easier. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:52 -04:00
Kent Overstreet	bfcf840ddf	bcachefs: Mark superblocks transactionally More work towards getting rid of the in memory struct bucket: this path adds code for marking superblock and journal buckets via the btree, and uses it in the device add and journal resize paths. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:52 -04:00
Kent Overstreet	fdbb88ac01	bcachefs: Fix a faulty assertion If journal replay hasn't finished, the journal can't be empty - oops. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:51 -04:00
Kent Overstreet	280249b9d9	bcachefs: Correctly order flushes and journal writes on multi device filesystems All writes prior to a journal write need to be flushed before the journal write itself happens. On single device filesystems, it suffices to mark the write with REQ_PREFLUSH\|REQ_FUA, but on multi device filesystems we need to issue flushes to every device - and wait for them to complete - before issuing the journal writes. Previously, we were issuing flushes to every device, but we weren't waiting for them to complete before issuing the journal writes. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:51 -04:00
Kent Overstreet	ed9d58a2b1	bcachefs: Run jset_validate in write path as well This is because we had a bug where we were writing out journal entries with garbage last_seq, and not catching it. Also, completely ignore jset->last_seq when JSET_NO_FLUSH is true, because of aforementioned bug, but change the write path to set last_seq to 0 when JSET_NO_FLUSH is true. Minor other cleanups and comments. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:51 -04:00
Kent Overstreet	07a1006ae8	bcachefs: Reduce/kill BKEY_PADDED use With various newer key types - stripe keys, inline data extents - the old approach of calculating the maximum size of the value is becoming more and more error prone. Better to switch to bkey_on_stack, which can dynamically allocate if necessary to handle any size bkey. In particular we also want to get rid of BKEY_EXTENT_VAL_U64s_MAX. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:50 -04:00
Kent Overstreet	3187aa8d57	bcachefs: Don't use BTREE_INSERT_USE_RESERVE so much Previously, we were using BTREE_INSERT_RESERVE in a lot of places where it no longer makes sense. - we now have more open_buckets than we used to, and the reserves work better, so we shouldn't need to use BTREE_INSERT_RESERVE just because we're holding open_buckets pinned anymore. - We have the btree key cache for updates to the alloc btree, meaning we no longer need the btree reserve to ensure the allocator can make forward progress. This means that we should only need a reserve for btree updates to ensure that copygc can make forward progress. Since it's now just for copygc, we can also fold RESERVE_BTREE into RESERVE_MOVINGGC (the allocator's freelist reserve). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:50 -04:00
Kent Overstreet	5d32c5bb07	bcachefs: Be more conservation about journal pre-reservations - Try to always keep 1/8th of the journal free, on top of pre-reservations - Move the check for whether the journal is stuck to bch2_journal_space_available, and make it only fire when there aren't any journal writes in flight (that might free up space by updating last_seq) Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:49 -04:00
Kent Overstreet	adbcada43f	bcachefs: Don't require flush/fua on every journal write This patch adds a flag to journal entries which, if set, indicates that they weren't done as flush/fua writes. - non flush/fua journal writes don't update last_seq (i.e. they don't free up space in the journal), thus the journal free space calculations now check whether nonflush journal writes are currently allowed (i.e. are we low on free space, or would doing a flush write free up a lot of space in the journal) - write_delay_ms, the user configurable option for when open journal entries are automatically written, is now interpreted as the max delay between flush journal writes (default 1 second). - bch2_journal_flush_seq_async is changed to ensure a flush write >= the requested sequence number has happened - journal read/replay must now ignore, and blacklist, any journal entries newer than the most recent flush entry in the journal. Also, the way the read_entire_journal option is handled has been improved; struct journal_replay now has an entry, 'ignore', for entries that were read but should not be used. - assorted refactoring and improvements related to journal read in journal_io.c and recovery.c Previously, we'd have to issue a flush/fua write every time we accumulated a full journal entry - typically the bucket size. Now we need to issue them much less frequently: when an fsync is requested, or it's been more than write_delay_ms since the last flush, or when we need to free up space in the journal. This is a significant performance improvement on many write heavy workloads. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:49 -04:00
Kent Overstreet	b6df4325cd	bcachefs: Improve journal free space calculations Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:49 -04:00
Kent Overstreet	ebb84d0941	bcachefs: Increase journal pipelining This patch increases the maximum journal buffers in flight from 2 to 4 - this will be particularly helpful when in the future we stop requiring flush+fua for every journal write. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:49 -04:00
Kent Overstreet	c5bb169034	bcachefs: Fix journal_flush_seq() The error check was inverted - leading fsyncs to get stuck and hang, oops. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:49 -04:00
Kent Overstreet	33b3b1dc0f	bcachefs: Optimize bch2_journal_flush_seq_async() Avoid taking the journal lock if we don't have to. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:48 -04:00
Kent Overstreet	f302055077	bcachefs: Fix an rcu splat bch2_bucket_alloc() requires rcu_read_lock() to be held. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:48 -04:00

1 2

99 Commits