linux

iv/linux

Author	SHA1	Message	Date
Kent Overstreet	30ef633a0b	bcachefs: Refactor journal code to not use unwritten_idx It makes the code more readable if we work off of sequence numbers, instead of direct indexes into the array of journal buffers. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:26 -04:00
Kent Overstreet	75ef2c59bc	bcachefs: Start moving debug info from sysfs to debugfs In sysfs, files can only output at most PAGE_SIZE. This is a problem for debug info that needs to list an arbitrary number of times, and because of this limit some of our debug info has been terser and harder to read than we'd like. This patch moves info about journal pins and cached btree nodes to debugfs, and greatly expands and improves the output we return. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:26 -04:00
Kent Overstreet	b66b2bc0f6	bcachefs: Revert "Ensure journal doesn't get stuck in nochanges mode" This patch was originally to work around the journal geting stuck in nochanges mode - but that was just a hack, we needed to fix the actual bug. It should be fixed now, so revert it. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:25 -04:00
Kent Overstreet	e201f70b11	bcachefs: Fix for journal getting stuck The journal can get stuck if we need to get a journal reservation for something we have a pre-reservation for, but aren't able to reclaim space, or if the pin fifo is full - it's impractical to resize the pin fifo at runtime. Previously, we reserved 8 entries in the pin fifo for pre-reservations, but that seems small - we're seeing the journal occasionally get stuck. Let's reserve a quarter of it. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:25 -04:00
Kent Overstreet	5b2e599f50	bcachefs: bch2_journal_noflush_seq() Add bch2_journal_noflush_seq(), for telling the journal that entries before a given sequence number should not be flushes - to be used by an upcoming allocator optimization. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:20 -04:00
Kent Overstreet	0e030f5e20	bcachefs: Kill journal buf bloom filter This was used for recording which inodes have been modified by in flight journal writes, but was broken and has been superceded. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:16 -04:00
Kent Overstreet	fae1157d18	bcachefs: Ensure journal doesn't get stuck in nochanges mode This tweaks the journal code to always act as if there's space available in nochanges mode, when we're not going to be doing any writes. This helps in recovering filesystems that won't mount because they need journal replay and the journal has gotten stuck. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:15 -04:00
Kent Overstreet	8ce600d447	bcachefs: Fix for btree_gc repairing interior btree ptrs Using the normal transaction commit path to insert and journal updates to interior nodes hadn't been done before this repair code was written, not surprising that there was a bug. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:01 -04:00
Kent Overstreet	671cc8a51b	bcachefs: Eliminate memory barrier from fast path of journal_preres_put() Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:59 -04:00
Kent Overstreet	2940295c97	bcachefs: Be more careful about JOURNAL_RES_GET_RESERVED JOURNAL_RES_GET_RESERVED should only be used for updatse that need to be done to free up space in the journal. In particular, when we're flushing keys from the key cache, if we're flushing them out of order we shouldn't be using it, since we're using up our remaining space in the journal without dropping a pin that will let us make forward progress. With this patch, BTREE_INSERT_JOURNAL_RECLAIM without BTREE_INSERT_JOURNAL_RESERVED may return -EAGAIN - we can't wait on journal reclaim if we're already in journal reclaim. This means we need to propagate these errors up to journal reclaim, indicating that flushing a journal pin should be retried in the future. This is prep work for a patch to change the way journal reclaim works, to split out flushing key cache keys because the btree key cache is too dirty from journal reclaim because we need space in the journal. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:59 -04:00
Kent Overstreet	280249b9d9	bcachefs: Correctly order flushes and journal writes on multi device filesystems All writes prior to a journal write need to be flushed before the journal write itself happens. On single device filesystems, it suffices to mark the write with REQ_PREFLUSH\|REQ_FUA, but on multi device filesystems we need to issue flushes to every device - and wait for them to complete - before issuing the journal writes. Previously, we were issuing flushes to every device, but we weren't waiting for them to complete before issuing the journal writes. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:51 -04:00
Kent Overstreet	5d32c5bb07	bcachefs: Be more conservation about journal pre-reservations - Try to always keep 1/8th of the journal free, on top of pre-reservations - Move the check for whether the journal is stuck to bch2_journal_space_available, and make it only fire when there aren't any journal writes in flight (that might free up space by updating last_seq) Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:49 -04:00
Kent Overstreet	adbcada43f	bcachefs: Don't require flush/fua on every journal write This patch adds a flag to journal entries which, if set, indicates that they weren't done as flush/fua writes. - non flush/fua journal writes don't update last_seq (i.e. they don't free up space in the journal), thus the journal free space calculations now check whether nonflush journal writes are currently allowed (i.e. are we low on free space, or would doing a flush write free up a lot of space in the journal) - write_delay_ms, the user configurable option for when open journal entries are automatically written, is now interpreted as the max delay between flush journal writes (default 1 second). - bch2_journal_flush_seq_async is changed to ensure a flush write >= the requested sequence number has happened - journal read/replay must now ignore, and blacklist, any journal entries newer than the most recent flush entry in the journal. Also, the way the read_entire_journal option is handled has been improved; struct journal_replay now has an entry, 'ignore', for entries that were read but should not be used. - assorted refactoring and improvements related to journal read in journal_io.c and recovery.c Previously, we'd have to issue a flush/fua write every time we accumulated a full journal entry - typically the bucket size. Now we need to issue them much less frequently: when an fsync is requested, or it's been more than write_delay_ms since the last flush, or when we need to free up space in the journal. This is a significant performance improvement on many write heavy workloads. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:49 -04:00
Kent Overstreet	ebb84d0941	bcachefs: Increase journal pipelining This patch increases the maximum journal buffers in flight from 2 to 4 - this will be particularly helpful when in the future we stop requiring flush+fua for every journal write. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:49 -04:00
Kent Overstreet	158eecb88e	bcachefs: Assorted journal refactoring Improved the way we track various state by adding j->err_seq, which records the first journal sequence number that encountered an error being written, and j->last_empty_seq, which records the most recent journal entry that was completely empty. Also, use the low bits of the journal sequence number to index the corresponding journal_buf. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:47 -04:00
Kent Overstreet	1676a398d3	bcachefs: Delete dead journalling code Usage of the journal has gotten somewhat simpler over time - neat. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:47 -04:00
Kent Overstreet	61ce38b862	bcachefs: Fix journal_seq_copy() We also need to update the journal's bloom filter of inode numbers that each journal write has upudates for - in case the inode gets evicted before it gets fsynced. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:44 -04:00
Kent Overstreet	7807e14384	bcachefs: Convert various code to printbuf printbufs know how big the buffer is that was allocated, so we can get rid of the random PAGE_SIZEs all over the place. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:43 -04:00
Kent Overstreet	4efe71a646	bcachefs: Always give out journal pre-res if we already have one This is better than skipping the journal pre-reservation if we already have one - we should still acount for the journal reservation we're going to have to get. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:41 -04:00
Kent Overstreet	00b8ccf707	bcachefs: Interior btree updates are now fully transactional We now update the alloc info (bucket sector counts) atomically with journalling the update to the interior btree nodes, and we also set new btree roots atomically with the journalled part of the btree update. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:40 -04:00
Kent Overstreet	96e2aa1be5	bcachefs: Add a mechanism for passing extra journal entries to bch2_trans_commit() Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:40 -04:00
Kent Overstreet	15a07f2eae	bcachefs: Improve lockdep annotation in journalling code bch2_journal_res_get() in nonblocking mode is equivalent to a trylock. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:38 -04:00
Kent Overstreet	2a9101a989	bcachefs: Refactor bch2_trans_commit() path Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:30 -04:00
Kent Overstreet	478259b749	bcachefs: delete duplicated code Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:20 -04:00
Kent Overstreet	1dd7f9d98d	bcachefs: Rewrite journal_seq_blacklist machinery Now, we store blacklisted journal sequence numbers in the superblock, not the journal: this helps to greatly simplify the code, and more importantly it's now implemented in a way that doesn't require all btree nodes to be visited before starting the journal - instead, we unconditionally blacklist the next 4 journal sequence numbers after an unclean shutdown. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:20 -04:00
Kent Overstreet	68ef94a63c	bcachefs: Add a pre-reserve mechanism for the journal Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:17 -04:00
Kent Overstreet	8db2acde2f	bcachefs: fix integer underflow in journal code Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:16 -04:00
Kent Overstreet	d16b4a77a5	bcachefs: Assorted journal refactoring Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:16 -04:00
Kent Overstreet	768ac63924	bcachefs: Add a mechanism for blocking the journal Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:16 -04:00
Kent Overstreet	eac3ca0f49	bcachefs: New journal_entry_res mechanism Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:15 -04:00
Kent Overstreet	f1a79365a7	bcachefs: Don't block on journal reservation with btree locks held Fixes a deadlock between the allocator thread, when it first starts up, and journal replay Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:11 -04:00
Kent Overstreet	2252aa271c	bcachefs: btree gc refactoring prep work for erasure coding Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:10 -04:00
Kent Overstreet	1c6fdbd8f2	bcachefs: Initial commit Initially forked from drivers/md/bcache, bcachefs is a new copy-on-write filesystem with every feature you could possibly want. Website: https://bcachefs.org Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:07 -04:00

33 Commits