linux

iv/linux

Author	SHA1	Message	Date
Kent Overstreet	aae15aafcd	bcachefs: New and improved topology repair code This splits out btree topology repair into a separate pass, and makes some improvements: - When we have to pick which of two overlapping nodes to drop keys from, we use the btree node header sequence number to preserve the newer node - the gc code has been changed so that it doesn't bail out if we're continuing/ignoring on fsck error - this way the dump tool can skip running the repair pass but still walk all reachable metadata - add a new superblock flag indicating when a filesystem is known to have btree topology issues, and the topology repair pass should be run - changing the start/end of a node might mean keys in that node have to be deleted: this patch handles that better by splitting it out into a separate function and running it explicitly in the topology repair code, previously those keys were only being dropped when the btree node was read in. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:02 -04:00
Kent Overstreet	4932e07ea0	bcachefs: Fix key cache assertion Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:02 -04:00
Kent Overstreet	d62ab355d7	bcachefs: Fix bch2_trans_mark_dev_sb() Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:00 -04:00
Kent Overstreet	9d8022db1c	bcachefs: Eliminate more PAGE_SIZE uses In userspace, we don't really have a well defined PAGE_SIZE and shouln't be relying on it. This is some more incremental work to remove references to it. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:59 -04:00
Kent Overstreet	2c944fa12d	bcachefs: Add a print statement for when we go read-write Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:56 -04:00
Kent Overstreet	2436cb9fad	bcachefs: Use x-macros for more enums This patch standardizes all the enums that have associated string tables (probably more enums should have string tables). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:55 -04:00
Kent Overstreet	41f8b09edc	bcachefs: Rename BTREE_ID enums for consistency with other enums Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:55 -04:00
Kent Overstreet	9620c3ec2f	bcachefs: Add a mempool for the replicas delta list Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:54 -04:00
Kent Overstreet	9ae28f824e	bcachefs: Start journal reclaim thread earlier Especially in userspace, we sometime run into resource exhaustion issues with starting up threads after mark and sweep/fsck. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:54 -04:00
Kent Overstreet	2ee47eec44	bcachefs: Fix for copygc getting stuck waiting for reserve to be filled This fixes a regression from the patch bcachefs: Fix copygc dying on startup In general only the allocator thread itself should be updating ca->allocator_state, the thread waking up the allocator setting it is an ugly hack only needed to avoid racing with the copygc threads when we're first starting up. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:54 -04:00
Kent Overstreet	51c66fedc0	bcachefs: Rip out copygc pd controller We have a separate mechanism for ratelimiting copygc now - the pd controller has only been causing problems. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:54 -04:00
Kent Overstreet	220d206232	bcachefs: Fix an allocator startup race Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:54 -04:00
Kent Overstreet	59a7405161	bcachefs: Create allocator threads when allocating filesystem We're seeing failures to mount because of a failure to start the allocator threads, which currently happens fairly late in the mount process, after walking all metadata, and kthread_create() fails if something has tried to kill the mount process, which is probably not what we want. This patch avoids this issue by creating, but not starting, the allocator threads when we preallocate all of our other in memory data structures. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:53 -04:00
Kent Overstreet	fcb3431be8	bcachefs: Redo checks for sufficient devices When the replicas mechanism was added, for tracking data by which drives it's replicated on, the check for whether we have sufficient devices was never updated to make use of it. This patch finally does that. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:53 -04:00
Kent Overstreet	4b8f89afd4	bcachefs: Fixes/improvements for journal entry reservations This fixes some arithmetic bugs in "bcachefs: Journal updates to dev usage" - additionally, it cleans things up by switching everything that goes in every journal entry to the journal_entry_res mechanism. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:52 -04:00
Kent Overstreet	180fb49dea	bcachefs: Journal updates to dev usage This eliminates the need to scan every bucket to regenerate dev_usage at mount time. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:52 -04:00
Kent Overstreet	2abe542087	bcachefs: Persist 64 bit io clocks Originally, bcachefs - going back to bcache - stored, for each bucket, a 16 bit counter corresponding to how long it had been since the bucket was read from. But, this required periodically rescaling counters on every bucket to avoid wraparound. That wasn't an issue in bcache, where we'd perodically rewrite the per bucket metadata all at once, but in bcachefs we're trying to avoid having to walk every single bucket. This patch switches to persisting 64 bit io clocks, corresponding to the 64 bit bucket timestaps introduced in the previous patch with KEY_TYPE_alloc_v2. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:52 -04:00
Kent Overstreet	5b593ee172	bcachefs: Add support for doing btree updates prior to journal replay Some errors may need to be fixed in order for GC to successfully run - walk and mark all metadata. But we can't start the allocators and do normal btree updates until after GC has completed, and allocation information is known to be consistent, so we need a different method of doing btree updates. Fortunately, we already have code for walking the btree while overlaying keys from the journal to be replayed. This patch adds an update path that adds keys to the list of keys to be replayed by journal replay, and also fixes up iterators. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:52 -04:00
Kent Overstreet	bfcf840ddf	bcachefs: Mark superblocks transactionally More work towards getting rid of the in memory struct bucket: this path adds code for marking superblock and journal buckets via the btree, and uses it in the device add and journal resize paths. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:52 -04:00
Kent Overstreet	72eab8da47	bcachefs: Refactor dev usage This is to make it more amenable for serialization. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:52 -04:00
Kent Overstreet	a5cd80ea99	bcachefs: Fix an assertion pop There was a race: btree node writes drop their reference on journal pins before clearing the btree_node_write_in_flight flag. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:51 -04:00
Kent Overstreet	f299d57350	bcachefs: Refactor filesystem usage accounting Various filesystem usage counters are kept in percpu counters, with one set per in flight journal buffer. Right now all the code that deals with it assumes that there's only two buffers/sets of counters, but the number of journal bufs is getting increased to 4 in the next patch - so refactor that code to not assume a constant. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:49 -04:00
Kent Overstreet	b7a9bbfc1b	bcachefs: Move journal reclaim to a kthread This is to make tracing easier. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:48 -04:00
Kent Overstreet	14ba3706b3	bcachefs: Add a kmem_cache for btree_key_cache objects We allocate a lot of these, and we're seeing sporading OOMs - this will help with tracking those down. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:47 -04:00
Kent Overstreet	a3e7226268	bcachefs: New varints Previous varint implementation used by the inode code was not nearly as fast as it could have been; partly because it was attempting to encode integers up to 96 bits (for timestamps) but this meant that encoding and decoding the length required a table lookup. Instead, we'll just encode timestamps greater than 64 bits as two separate varints; this will make decoding/encoding of inodes significantly faster overall. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:46 -04:00
Kent Overstreet	1a21bf9866	bcachefs: Add a single slot percpu buf for btree iters Allocating our array of btree iters is a big enough allocation that it hits the buddy allocator, and we're seeing lots of lock contention. Sticking a single element buffer in front of it should help. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:46 -04:00
Kent Overstreet	b5e8a6992f	bcachefs: Improved inode create optimization This shards new inodes into different btree nodes by using the processor ID for the high bits of the new inode number. Much faster than the previous inode create optimization - this also helps with sharding in the other btrees that index by inode number. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:46 -04:00
Kent Overstreet	2f33ece9b4	bcachefs: Minor journal reclaim improvement With the btree key cache code, journal reclaim now has a lot more work to do. It could be the case that after journal reclaim has finished one iteration there's already more work to do, so put it in a loop to check for that. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:45 -04:00
Kent Overstreet	45e4dcba79	bcachefs: Inode create optimization On workloads that do a lot of multithreaded creates all at once, lock contention on the inodes btree turns out to still be an issue. This patch adds a small buffer of inode numbers that are known to be free, so that we can avoid touching the btree on every create. Also, this changes inode creates to update via the btree key cache for the initial create. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:45 -04:00
Kent Overstreet	289980195f	bcachefs: Start/stop io clock hands in read/write paths This fixes a bug where the clock hands in the journal and superblock didn't match, because we were still incrementing the read clock hand while read-only. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:44 -04:00
Kent Overstreet	8d6b6222bf	bcachefs: Improvements to writing alloc info Now that we've got transactional alloc info updates (and have for awhile), we don't need to write it out on shutdown, and we don't need to write it out on startup except when GC found errors - this is a big improvement to mount/unmount performance. This patch also fixes a few bugs where we weren't writing out alloc info (on new filesystems, and new devices) and should have been. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:44 -04:00
Kent Overstreet	9f20ed157d	bcachefs: Fix copygc dying on startup The copygc threads errors out and makes the filesystem go RO if it ever tries to run and discovers it has no reserve allocated - which is a problem if it races with the allocator thread and its reserve hasn't been filled yet. The allocator thread doesn't start filling the copygc reserve until after BCH_FS_STARTED has been set, so make sure to wake up the allocator threads after setting that and before starting copygc. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:44 -04:00
Kent Overstreet	505b7a4c28	bcachefs: Fix errors early in the fs init process At some point bch2_fs_alloc() was changed to always call bch2_fs_free() in the error path, which means we need c->cl to always be initialized. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:44 -04:00
Kent Overstreet	d5e4dcc29c	bcachefs: Fix unmount path There was a long standing race in the mount/unmount code - the VFS intends for mount/unmount synchronizatino to be handled by the list of superblocks, but we were still holding devices open after tearing down our superblock in the unmount path. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:44 -04:00
Kent Overstreet	625104ea21	bcachefs: Don't fail mount if device has been removed Also - make sure to show the devices we actually have open in /proc Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:44 -04:00
Kent Overstreet	9f115ce9e9	bcachefs: Fix a bug with the journal_seq_blacklist mechanism Previously, we would start doing btree updates before writing the first journal entry; if this was after an unclean shutdown, this could cause those btree updates to not be blacklisted. Also, move some code to headers for userspace debug tools. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:43 -04:00
Kent Overstreet	74ed7e560b	bcachefs: Don't let copygc buckets be stolen by other threads And assorted other copygc fixes. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:43 -04:00
Kent Overstreet	e6d1161530	bcachefs: Make copygc thread global Per device copygc threads don't move data to different devices and they make fragmentation works - they don't make much sense anymore. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:42 -04:00
Kent Overstreet	89fd25be70	bcachefs: Use x-macros for data types Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:42 -04:00
Kent Overstreet	703e2a43bf	bcachefs: Move stripe creation to workqueue This is mainly to solve a lock ordering issue, and also simplifies the code a bit. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:42 -04:00
Kent Overstreet	ba6dd1dd49	bcachefs: Improve stripe triggers/heap code Soon we'll be able to modify existing stripes - replacing empty blocks with new blocks and new p/q blocks. This patch updates the trigger code to handle pointers changing in an existing stripe; also, it significantly improves how the stripes heap works, which means we can get rid of the stripe creation/deletion lock. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:42 -04:00
Kent Overstreet	5d20ba48f0	bcachefs: Use cached iterators for alloc btree Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:41 -04:00
Kent Overstreet	2ca88e5ad9	bcachefs: Btree key cache This introduces a new kind of btree iterator, cached iterators, which point to keys cached in a hash table. The cache also acts as a write cache - in the update path, we journal the update but defer updating the btree until the cached entry is flushed by journal reclaim. Cache coherency is for now up to the users to handle, which isn't ideal but should be good enough for now. These new iterators will be used for updating inodes and alloc info (the alloc and stripes btrees). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:41 -04:00
Kent Overstreet	1ada160618	bcachefs: Turn c->state_lock into an rwsem Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:41 -04:00
Kent Overstreet	a27443bc76	bcachefs: Kill old allocator startup code It's not needed anymore since we can now write to buckets before updating the alloc btree. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:40 -04:00
Kent Overstreet	039fc4c522	bcachefs: Fixes for going RO Now that interior btree updates are fully transactional, we don't need to write out alloc info in a loop. However, interior btree updates do put more things in the journal, so we still need a loop in the RO sequence. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:40 -04:00
Kent Overstreet	00b8ccf707	bcachefs: Interior btree updates are now fully transactional We now update the alloc info (bucket sector counts) atomically with journalling the update to the interior btree nodes, and we also set new btree roots atomically with the journalled part of the btree update. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:40 -04:00
Kent Overstreet	c823c3390b	bcachefs: Factor out bch2_fs_btree_interior_update_init() Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:40 -04:00
Kent Overstreet	b29303966b	bcachefs: Fix reading of alloc info after unclean shutdown When updates to interior nodes started being journalled, that meant that after an unclean shutdown, until journal replay is done we can't walk the btree without overlaying the updates from the journal. The initial btree gc was changed to walk the btree overlaying keys from the journal - but bch2_alloc_read() and bch2_stripes_read() were missed. Major whoops... Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:40 -04:00
Kent Overstreet	2340fd9d27	bcachefs: Be more rigorous about marking the filesystem clean Previously, there was at least one error path where we could mark the filesystem clean when we hadn't sucessfully written out alloc info. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:08:39 -04:00

1 2 3

107 Commits