linux

iv/linux

Author	SHA1	Message	Date
Kent Overstreet	bf5a261c7a	bcachefs: Assorted fixes for clang clang had a few more warnings about enum conversion, and also didn't like the opts.c initializer. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:09 -04:00
Kent Overstreet	73bd774d28	bcachefs: Assorted sparse fixes - endianness fixes - mark some things static - fix a few __percpu annotations - fix silent enum conversions Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:06 -04:00
Kent Overstreet	6547ebabda	bcachefs: Don't call lock_graph_descend() with wait lock held This fixes a deadlock: 01305 WARNING: possible circular locking dependency detected 01305 6.3.0-ktest-gf4de9bee61af #5305 Tainted: G W 01305 ------------------------------------------------------ 01305 cat/14658 is trying to acquire lock: 01305 ffffffc00982f460 (fs_reclaim){+.+.}-{0:0}, at: __kmem_cache_alloc_node+0x48/0x278 01305 01305 but task is already holding lock: 01305 ffffff8011aaf040 (&lock->wait_lock){+.+.}-{2:2}, at: bch2_check_for_deadlock+0x4b8/0xa58 01305 01305 which lock already depends on the new lock. 01305 01305 01305 the existing dependency chain (in reverse order) is: 01305 01305 -> #2 (&lock->wait_lock){+.+.}-{2:2}: 01305 _raw_spin_lock+0x54/0x70 01305 __six_lock_wakeup+0x40/0x1b0 01305 six_unlock_ip+0xe8/0x248 01305 bch2_btree_key_cache_scan+0x720/0x940 01305 shrink_slab.constprop.0+0x284/0x770 01305 shrink_node+0x390/0x828 01305 balance_pgdat+0x390/0x6d0 01305 kswapd+0x2e4/0x718 01305 kthread+0x184/0x1a8 01305 ret_from_fork+0x10/0x20 01305 01305 -> #1 (&c->lock#2){+.+.}-{3:3}: 01305 __mutex_lock+0x104/0x14a0 01305 mutex_lock_nested+0x30/0x40 01305 bch2_btree_key_cache_scan+0x5c/0x940 01305 shrink_slab.constprop.0+0x284/0x770 01305 shrink_node+0x390/0x828 01305 balance_pgdat+0x390/0x6d0 01305 kswapd+0x2e4/0x718 01305 kthread+0x184/0x1a8 01305 ret_from_fork+0x10/0x20 01305 01305 -> #0 (fs_reclaim){+.+.}-{0:0}: 01305 __lock_acquire+0x19d0/0x2930 01305 lock_acquire+0x1dc/0x458 01305 fs_reclaim_acquire+0x9c/0xe0 01305 __kmem_cache_alloc_node+0x48/0x278 01305 __kmalloc_node_track_caller+0x5c/0x278 01305 krealloc+0x94/0x180 01305 bch2_printbuf_make_room.part.0+0xac/0x118 01305 bch2_prt_printf+0x150/0x1e8 01305 bch2_btree_bkey_cached_common_to_text+0x170/0x298 01305 bch2_btree_trans_to_text+0x244/0x348 01305 print_cycle+0x7c/0xb0 01305 break_cycle+0x254/0x528 01305 bch2_check_for_deadlock+0x59c/0xa58 01305 bch2_btree_deadlock_read+0x174/0x200 01305 full_proxy_read+0x94/0xf0 01305 vfs_read+0x15c/0x3a8 01305 ksys_read+0xb8/0x148 01305 __arm64_sys_read+0x48/0x60 01305 invoke_syscall.constprop.0+0x64/0x138 01305 do_el0_svc+0x84/0x138 01305 el0_svc+0x34/0x80 01305 el0t_64_sync_handler+0xb0/0xb8 01305 el0t_64_sync+0x14c/0x150 01305 01305 other info that might help us debug this: 01305 01305 Chain exists of: 01305 fs_reclaim --> &c->lock#2 --> &lock->wait_lock 01305 01305 Possible unsafe locking scenario: 01305 01305 CPU0 CPU1 01305 ---- ---- 01305 lock(&lock->wait_lock); 01305 lock(&c->lock#2); 01305 lock(&lock->wait_lock); 01305 lock(fs_reclaim); 01305 01305 * DEADLOCK * Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:04 -04:00
Kent Overstreet	25aa8c2167	bcachefs: bch2_trans_unlock_noassert() This fixes a spurious assert in the btree node read path. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:04 -04:00
Kent Overstreet	b5fd75669a	bcachefs: drop_locks_do() Add a new helper for the common pattern of: - trans_unlock() - do something - trans_relock() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:03 -04:00
Kent Overstreet	f154c3eb42	bcachefs: trans_for_each_path_safe() bch2_btree_trans_to_text() is used on btree_trans objects that are owned by different threads - when printing out deadlock cycles - so we need a safe version of trans_for_each_path(), else we race with seeing a btree_path that was just allocated and not fully initialized: Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:02 -04:00
Kent Overstreet	0d2234a79e	six locks: Kill six_lock_pcpu_(alloc\|free) six_lock_pcpu_alloc() is an unsafe interface: it's not safe to allocate or free the percpu reader count on an existing lock that's in use, the only safe time to allocate percpu readers is when the lock is first being initialized. This patch adds a flags parameter to six_lock_init(), and instead of six_lock_pcpu_free() we now expose six_lock_exit(), which does the same thing but is less likely to be misused. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:01 -04:00
Kent Overstreet	01bf56a977	six locks: six_lock_readers_add() This moves a helper out of the bcachefs code that shouldn't have been there, since it touches six lock internals. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:10:01 -04:00
Kent Overstreet	3329cf1bb9	bcachefs: Centralize btree node lock initialization This fixes some confusion in the lockdep code due to initializing btree node/key cache locks with the same lockdep key, but different names. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:55 -04:00
Kent Overstreet	73d86dfd88	bcachefs: Fix erasure coding locking This adds a new helper, bch2_trans_mutex_lock(), for locking a mutex - dropping and retaking btree locks as needed. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:53 -04:00
Kent Overstreet	3ea4219d98	bcachefs: New backtrace utility code Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:53 -04:00
Kent Overstreet	60b5538877	bcachefs: trans->notrace_relock_fail When we unlock in order to submit IO, the next relock event is likely to fail if submit_bio() blocked - we shouldn't those events in our _fail stats, since those are expected events and shouldn't cause test failures. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:50 -04:00
Kent Overstreet	94c69fafa7	bcachefs: Use six_lock_ip() This uses the new _ip() interface to six locks and hooks it up to btree_path->ip_allocated, when available. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:50 -04:00
Kent Overstreet	313816363a	bcachefs: bch2_trans_relock_notrace() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:49 -04:00
Kent Overstreet	b2d1d56b1d	bcachefs: Fixes for building in userspace - Marking a non-static function as inline doesn't actually work and is now causing problems - drop that - Introduce BCACHEFS_LOG_PREFIX for when we want to prefix log messages with bcachefs (filesystem name) - Userspace doesn't have real percpu variables (maybe we can get this fixed someday), put an #ifdef around bch2_disk_reservation_add() fastpath Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:46 -04:00
Kent Overstreet	3e3e02e6bc	bcachefs: Assorted checkpatch fixes checkpatch.pl gives lots of warnings that we don't want - suggested ignore list: ASSIGN_IN_IF UNSPECIFIED_INT - bcachefs coding style prefers single token type names NEW_TYPEDEFS - typedefs are occasionally good FUNCTION_ARGUMENTS - we prefer to look at functions in .c files (hopefully with docbook documentation), not .h file prototypes MULTISTATEMENT_MACRO_USE_DO_WHILE - we have _many_ x-macros and other macros where we can't do this Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:44 -04:00
Kent Overstreet	40405557b9	fixup bcachefs: Deadlock cycle detector Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:43 -04:00
Kent Overstreet	80df5b8cac	fixup bcachefs: Deadlock cycle detector Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:43 -04:00
Kent Overstreet	896f1b316f	bcachefs: Fix lock_graph_remove_non_waiters() We were removing 1 more entry than we were supposed to - oops. Also some other simplifications and cleanups, and bring back the abort preference code in a better fashion. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:43 -04:00
Kent Overstreet	fe2de9a8dc	bcachefs: Simplify break_cycle() We'd like to prioritize aborting transactions that have done less work - however, it appears breaking cycles by telling other threads to abort may still be buggy, so disable that for now. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:43 -04:00
Kent Overstreet	1148a97f1f	bcachefs: Print cycle on unrecoverable deadlock Some lock operations can't fail; a cycle of nofail locks is impossible to recover from. So we want to get rid of these nofail locking operations, but as this is tricky it'll be done incrementally. If such a cycle happens, this patch prints out which codepaths are involved so we know what to work on next. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:42 -04:00
Kent Overstreet	13bc41a715	bcachefs: bch2_trans_locked() Useful debugging function. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:42 -04:00
Kent Overstreet	40a44873a5	bcachefs: Improve btree_deadlock debugfs output This changes bch2_check_for_deadlock() to print the longest chains it finds - when we have a deadlock because the cycle detector isn't finding something, this will let us see what it's missing. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:42 -04:00
Kent Overstreet	e9174370d0	bcachefs: bch2_btree_node_relock_notrace() Most of the node_relock_fail trace events are generated from bch2_btree_path_verify_level(), when debugcheck_iterators is enabled - but we're not interested in these trace events, they don't indicate that we're in a slowpath. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:41 -04:00
Kent Overstreet	2ec254c098	bcachefs: Ensure bch2_btree_node_lock_write_nofail() never fails In order for bch2_btree_node_lock_write_nofail() to never produce a deadlock, we must ensure we're never holding read locks when using it. Fortunately, it's only used from code paths where any read locks may be safely dropped. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:41 -04:00
Kent Overstreet	0d7009d7ca	bcachefs: Delete old deadlock avoidance code This deletes our old lock ordering based deadlock avoidance code. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:41 -04:00
Kent Overstreet	96d994b37c	bcachefs: Print deadlock cycle in debugfs In the event that we're not finished debugging the cycle detector, this adds a new file to debugfs that shows what the cycle detector finds, if anything. By comparing this with btree_transactions, which shows held locks for every btree_transaction, we'll be able to determine if it's the cycle detector that's buggy or something else. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:41 -04:00
Kent Overstreet	33bd5d0686	bcachefs: Deadlock cycle detector We've outgrown our own deadlock avoidance strategy. The btree iterator API provides an interface where the user doesn't need to concern themselves with lock ordering - different btree iterators can be traversed in any order. Without special care, this will lead to deadlocks. Our previous strategy was to define a lock ordering internally, and whenever we attempt to take a lock and trylock() fails, we'd check if the current btree transaction is holding any locks that cause a lock ordering violation. If so, we'd issue a transaction restart, and then bch2_trans_begin() would re-traverse all previously used iterators, but in the correct order. That approach had some issues, though. - Sometimes we'd issue transaction restarts unnecessarily, when no deadlock would have actually occured. Lock ordering restarts have become our primary cause of transaction restarts, on some workloads totally 20% of actual transaction commits. - To avoid deadlock or livelock, we'd often have to take intent locks when we only wanted a read lock: with the lock ordering approach, it is actually illegal to hold _any_ read lock while blocking on an intent lock, and this has been causing us unnecessary lock contention. - It was getting fragile - the various lock ordering rules are not trivial, and we'd been seeing occasional livelock issues related to this machinery. So, since bcachefs is already a relational database masquerading as a filesystem, we're stealing the next traditional database technique and switching to a cycle detector for avoiding deadlocks. When we block taking a btree lock, after adding ourself to the waitlist but before sleeping, we do a DFS of btree transactions waiting on other btree transactions, starting with the current transaction and walking our held locks, and transactions blocking on our held locks. If we find a cycle, we emit a transaction restart. Occasionally (e.g. the btree split path) we can not allow the lock() operation to fail, so if necessary we'll tell another transaction that it has to fail. Result: trans_restart_would_deadlock events are reduced by a factor of 10 to 100, and we'll be able to delete a whole bunch of grotty, fragile code. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:41 -04:00
Kent Overstreet	62448afee7	bcachefs: Fix bch2_btree_node_upgrade() Previously, if we were trying to upgrade from a read to an intent lock but we held an additional read lock via another btree_path, bch2_btree_node_upgrade() would always fail, in six_lock_tryupgrade(). This patch factors out the code that __bch2_btree_node_lock_write() uses to temporarily drop extra read locks, so that six_lock_tryupgrade() can succeed. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:41 -04:00
Kent Overstreet	e87b0e4a71	bcachefs: Fix redundant transaction restart Little bit of tidying up, this makes the counters a little bit clearer as to what's happening. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:40 -04:00
Kent Overstreet	da4474f209	bcachefs: Convert more locking code to btree_bkey_cached_common Ideally, all the code in btree_locking.c should be converted, but then we'd want to convert btree_path to point to btree_key_cached_common too, and then we'd be in for a much bigger cleanup - but a bit of incremental cleanup will still be helpful for the next patches. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:40 -04:00
Kent Overstreet	4e6defd106	bcachefs: btree_bkey_cached_common->cached Add a type descriptor to btree_bkey_cached_common - there's no reason not to since we've got padding that was otherwise unused, and this is a nice cleanup (and helpful in later patches). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:40 -04:00
Kent Overstreet	6b81f194f3	bcachefs: Fix six_lock_readers_add() Have to be careful with bit fields - when subtracting, this was overflowing into the write_locking bit. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:40 -04:00
Kent Overstreet	ca7d8fcabf	bcachefs: New locking functions In the future, with the new deadlock cycle detector, we won't be using bare six_lock_* anymore: lock wait entries will all be embedded in btree_trans, and we will need a btree_trans context whenever locking a btree node. This patch plumbs a btree_trans to the few places that need it, and adds two new locking functions - btree_node_lock_nopath, which may fail returning a transaction restart, and - btree_node_lock_nopath_nofail, to be used in places where we know we cannot deadlock (i.e. because we're holding no other locks). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:40 -04:00
Kent Overstreet	674cfc2624	bcachefs: Add persistent counters for all tracepoints Also, do some reorganizing/renaming, convert atomic counters in bch_fs to persistent counters, and add a few missing counters. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:39 -04:00
Kent Overstreet	8a9c1b1cb0	bcachefs: Improve bch2_btree_node_relock() This moves the IS_ERR_OR_NULL() check to the inline part, since that's a fast path event. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:39 -04:00
Kent Overstreet	5f1dd9a633	bcachefs: Improve btree_node_relock_fail tracepoint It now prints the error name when the btree node is an error pointer; also, don't trace failures when the the btree node is BCH_ERR_no_btree_node_up. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:39 -04:00
Kent Overstreet	131dcd5af7	bcachefs: Track held write locks The upcoming lock cycle detection code will need to know precisely which locks every btree_trans is holding, including write locks - this patch updates btree_node_locked_type to include write locks. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>	2023-10-22 17:09:39 -04:00
Kent Overstreet	14599cce44	bcachefs: Switch btree locking code to struct btree_bkey_cached_common This is just some type safety cleanup. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:39 -04:00
Kent Overstreet	e3738c6909	six locks: Improve six_lock_count six_lock_count now counts up whether a write lock held, and this patch now also correctly counts six_lock->intent_lock_recurse. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:39 -04:00
Kent Overstreet	d4263e5638	bcachefs: Better use of locking helpers Held btree locks are tracked in btree_path->nodes_locked and btree_path->nodes_intent_locked. Upcoming patches are going to change the representation in struct btree_path, so this patch switches to proper helpers instead of direct access to these fields. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:39 -04:00
Kent Overstreet	8e5696698d	bcachefs: Reorganize btree_locking.[ch] Tidy things up a bit before doing more work in this file. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:39 -04:00
Kent Overstreet	cd5afabea1	bcachefs: btree_locking.c Start to centralize some of the locking code in a new file; more locking code will be moving here in the future. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>	2023-10-22 17:09:39 -04:00

43 Commits