linux

iv/linux

Author	SHA1	Message	Date
Kent Overstreet	a1f0358b2b	bcache: Incremental gc Big garbage collection rewrite; now, garbage collection uses the same mechanisms as used elsewhere for inserting/updating btree node pointers, instead of rewriting interior btree nodes in place. This makes the code significantly cleaner and less fragile, and means we can now make garbage collection incremental - it doesn't have to hold a write lock on the root of the btree for the entire duration of garbage collection. This means that there's less of a latency hit for doing garbage collection, which means we can gc more frequently (and do a better job of reclaiming from the cache), and we can coalesce across more btree nodes (improving our space efficiency). Signed-off-by: Kent Overstreet <kmo@daterainc.com>	2013-11-10 21:56:37 -08:00
Kent Overstreet	8835c1234d	bcache: Add make_btree_freeing_key() Refactoring, prep work for incremental garbage collection. Signed-off-by: Kent Overstreet <kmo@daterainc.com>	2013-11-10 21:56:37 -08:00
Kent Overstreet	f269af5a07	bcache: Add btree_node_write_sync() More refactoring - mostly making the interfaces more explicit about what we actually want to do. Signed-off-by: Kent Overstreet <kmo@daterainc.com>	2013-11-10 21:56:36 -08:00
Kent Overstreet	0eacac2203	bcache: PRECEDING_KEY() btree_insert_key() was open coding this, this is just refactoring. Signed-off-by: Kent Overstreet <kmo@daterainc.com>	2013-11-10 21:56:36 -08:00
Kent Overstreet	d5cc66e957	bcache: bch_(btree\|extent)_ptr_invalid() Trying to treat btree pointers and leaf node pointers the same way was a mistake - going to start being more explicit about the type of key/pointer we're dealing with. This is the first part of that refactoring; this patch shouldn't change any actual behaviour. Signed-off-by: Kent Overstreet <kmo@daterainc.com>	2013-11-10 21:56:35 -08:00
Kent Overstreet	3a3b6a4e07	bcache: Don't bother with bucket refcount for btree node allocations The bucket refcount (dropped with bkey_put()) is only needed to prevent the newly allocated bucket from being garbage collected until we've added a pointer to it somewhere. But for btree node allocations, the fact that we have btree nodes locked is enough to guard against races with garbage collection. Eventually the per bucket refcount is going to be replaced with something specific to bch_alloc_sectors(). Signed-off-by: Kent Overstreet <kmo@daterainc.com>	2013-11-10 21:56:34 -08:00
Kent Overstreet	280481d06c	bcache: Debug code improvements Couple changes: * Consolidate bch_check_keys() and bch_check_key_order(), and move the checks that only check_key_order() could do to bch_btree_iter_next(). * Get rid of CONFIG_BCACHE_EDEBUG - now, all that code is compiled in when CONFIG_BCACHE_DEBUG is enabled, and there's now a sysfs file to flip on the EDEBUG checks at runtime. * Dropped an old not terribly useful check in rw_unlock(), and refactored/improved a some of the other debug code. Signed-off-by: Kent Overstreet <kmo@daterainc.com>	2013-11-10 21:56:34 -08:00
Kent Overstreet	e58ff15503	bcache: Fix bch_ptr_bad() Previously, bch_ptr_bad() could return false when there was a pointer to a nonexistant device... it only filtered out keys with PTR_CHECK_DEV pointers. This behaviour was intended for multiple cache device support; for that, just because the device for one of the pointers has gone away doesn't mean we want to filter out the rest of the pointers. But we don't yet explicitly filter/check individual pointers, so without that this behaviour was wrong - a corrupt bkey with a bad device pointer could cause us to deref a bad pointer. Doh. Signed-off-by: Kent Overstreet <kmo@daterainc.com>	2013-11-10 21:56:33 -08:00
Kent Overstreet	81ab4190ac	bcache: Pull on disk data structures out into a separate header Now, the on disk data structures are in a header that can be exported to userspace - and having them all centralized is nice too. Signed-off-by: Kent Overstreet <kmo@daterainc.com>	2013-11-10 21:56:33 -08:00
Kent Overstreet	2599b53b7b	bcache: Move sector allocator to alloc.c Just reorganizing things a bit. Signed-off-by: Kent Overstreet <kmo@daterainc.com>	2013-11-10 21:56:32 -08:00
Kent Overstreet	220bb38c21	bcache: Break up struct search With all the recent refactoring around struct btree op struct search has gotten rather large. But we can now easily break it up in a different way - we break out struct btree_insert_op which is for inserting data into the cache, and that's now what the copying gc code uses - struct search is now specific to request.c Signed-off-by: Kent Overstreet <kmo@daterainc.com>	2013-11-10 21:56:32 -08:00
Kent Overstreet	cc7b881921	bcache: Convert bch_btree_insert() to bch_btree_map_leaf_nodes() Last of the btree_map() conversions. Main visible effect is bch_btree_insert() is no longer taking a struct btree_op as an argument anymore - there's no fancy state machine stuff going on, it's just a normal function. Signed-off-by: Kent Overstreet <kmo@daterainc.com>	2013-11-10 21:56:31 -08:00
Kent Overstreet	6054c6d4da	bcache: Don't use op->insert_collision When we convert bch_btree_insert() to bch_btree_map_leaf_nodes(), we won't be passing struct btree_op to bch_btree_insert() anymore - so we need a different way of returning whether there was a collision (really, a replace collision). Signed-off-by: Kent Overstreet <kmo@daterainc.com>	2013-11-10 21:56:30 -08:00
Kent Overstreet	1b207d80d5	bcache: Kill op->replace This is prep work for converting bch_btree_insert to bch_btree_map_leaf_nodes() - we have to convert all its arguments to actual arguments. Bunch of churn, but should be straightforward. Signed-off-by: Kent Overstreet <kmo@daterainc.com>	2013-11-10 21:56:29 -08:00
Kent Overstreet	faadf0c965	bcache: Drop some closure stuff With a the recent bcache refactoring, some of the closure code isn't needed anymore. Signed-off-by: Kent Overstreet <kmo@daterainc.com>	2013-11-10 21:56:10 -08:00
Kent Overstreet	b54d6934da	bcache: Kill op->cl This isn't used for waiting asynchronously anymore - so this is a fairly trivial refactoring. Signed-off-by: Kent Overstreet <kmo@daterainc.com>	2013-11-10 21:56:09 -08:00
Kent Overstreet	c18536a72d	bcache: Prune struct btree_op Eventual goal is for struct btree_op to contain only what is necessary for traversing the btree. Signed-off-by: Kent Overstreet <kmo@daterainc.com>	2013-11-10 21:56:08 -08:00
Kent Overstreet	cc23196631	bcache: Clean up cache_lookup_fn There was some looping in submit_partial_cache_hit() and submit_partial_cache_hit() that isn't needed anymore - originally, we wouldn't necessarily process the full hit or miss all at once because when splitting the bio, we took into account the restrictions of the device we were sending it to. But, device bio size restrictions are now handled elsewhere, with a wrapper around generic_make_request() - so that looping has been unnecessary for awhile now and we can now do quite a bit of cleanup. And if we trim the key we're reading from to match the subset we're actually reading, we don't have to explicitly calculate bi_sector anymore. Neat. Signed-off-by: Kent Overstreet <kmo@daterainc.com>	2013-11-10 21:56:08 -08:00
Kent Overstreet	2c1953e201	bcache: Convert bch_btree_read_async() to bch_btree_map_keys() This is a fairly straightforward conversion, mostly reshuffling - op->lookup_done goes away, replaced by MAP_DONE/MAP_CONTINUE. And the code for handling cache hits and misses wasn't really btree code, so it gets moved to request.c. Signed-off-by: Kent Overstreet <kmo@daterainc.com>	2013-11-10 21:56:07 -08:00
Kent Overstreet	df8e89701f	bcache: Move some stuff to btree.c With the new btree_map() functions, we don't need to export the stuff needed for traversing the btree anymore. Signed-off-by: Kent Overstreet <kmo@daterainc.com>	2013-11-10 21:56:07 -08:00
Kent Overstreet	48dad8baf9	bcache: Add btree_map() functions Lots of stuff has been open coding its own btree traversal - which is generally pretty simple code, but there are a few subtleties. This adds new new functions, bch_btree_map_nodes() and bch_btree_map_keys(), which do the traversal for you. Everything that's open coding btree traversal now (with the exception of garbage collection) is slowly going to be converted to these two functions; being able to write other code at a higher level of abstraction is a big improvement w.r.t. overall code quality. Signed-off-by: Kent Overstreet <kmo@daterainc.com>	2013-11-10 21:56:06 -08:00
Kent Overstreet	5e6926daac	bcache: Convert writeback to a kthread This simplifies the writeback flow control quite a bit - previously, it was conceptually two coroutines, refill_dirty() and read_dirty(). This makes the code quite a bit more straightforward. Signed-off-by: Kent Overstreet <kmo@daterainc.com>	2013-11-10 21:56:05 -08:00
Kent Overstreet	72a44517f3	bcache: Convert gc to a kthread We needed a dedicated rescuer workqueue for gc anyways... and gc was conceptually a dedicated thread, just one that wasn't running all the time. Switch it to a dedicated thread to make the code a bit more straightforward. Signed-off-by: Kent Overstreet <kmo@daterainc.com>	2013-11-10 21:56:04 -08:00
Kent Overstreet	35fcd848d7	bcache: Convert bucket_wait to wait_queue_head_t At one point we did do fancy asynchronous waiting stuff with bucket_wait, but that's all gone (and bucket_wait is used a lot less than it used to be). So use the standard primitives. Signed-off-by: Kent Overstreet <kmo@daterainc.com>	2013-11-10 21:56:04 -08:00
Kent Overstreet	e8e1d4682c	bcache: Convert try_wait to wait_queue_head_t We never waited on c->try_wait asynchronously, so just use the standard primitives. Signed-off-by: Kent Overstreet <kmo@daterainc.com>	2013-11-10 21:56:03 -08:00
Kent Overstreet	0b93207abb	bcache: Move keylist out of btree_op Slowly working on pruning struct btree_op - the aim is for it to only contain things that are actually necessary for traversing the btree. Signed-off-by: Kent Overstreet <kmo@daterainc.com>	2013-11-10 21:56:02 -08:00
Kent Overstreet	a34a8bfd4e	bcache: Refactor journalling flow control Making things less asynchronous that don't need to be - bch_journal() only has to block when the journal or journal entry is full, which is emphatically not a fast path. So make it a normal function that just returns when it finishes, to make the code and control flow easier to follow. Signed-off-by: Kent Overstreet <kmo@daterainc.com>	2013-11-10 21:56:02 -08:00
Kent Overstreet	cdd972b164	bcache: Refactor read request code a bit More refactoring, and renaming. Signed-off-by: Kent Overstreet <kmo@daterainc.com>	2013-11-10 21:56:01 -08:00
Kent Overstreet	84f0db03ea	bcache: Refactor request_write() Try to improve some of the naming a bit to be more consistent, and also improve the flow of control in request_write() a bit. Signed-off-by: Kent Overstreet <kmo@daterainc.com>	2013-11-10 21:56:00 -08:00
Kent Overstreet	c2f95ae2eb	bcache: Clean up keylist code More random refactoring. Signed-off-by: Kent Overstreet <kmo@daterainc.com>	2013-11-10 21:56:00 -08:00
Kent Overstreet	4f3d40147b	bcache: Add explicit keylist arg to btree_insert() Some refactoring - better to explicitly pass stuff around instead of having it all in the "big bag of state", struct btree_op. Going to prune struct btree_op quite a bit over time. Signed-off-by: Kent Overstreet <kmo@daterainc.com>	2013-11-10 21:55:59 -08:00
Kent Overstreet	e7c590eb63	bcache: Convert btree_insert_check_key() to btree_insert_node() This was the main point of all this refactoring - now, btree_insert_check_key() won't fail just because the leaf node happened to be full. Signed-off-by: Kent Overstreet <kmo@daterainc.com>	2013-11-10 21:55:59 -08:00
Kent Overstreet	403b6cdeb1	bcache: Insert multiple keys at a time We'll often end up with a list of adjacent keys to insert - because bch_data_insert() may have to fragment the data it writes. Originally, to simplify things and avoid having to deal with corner cases bch_btree_insert() would pass keys from this list one at a time to btree_insert_recurse() - mainly because the list of keys might span leaf nodes, so it was easier this way. With the btree_insert_node() refactoring, it's now a lot easier to just pass down the whole list and have btree_insert_recurse() iterate over leaf nodes until it's done. Signed-off-by: Kent Overstreet <kmo@daterainc.com>	2013-11-10 21:55:58 -08:00
Kent Overstreet	26c949f806	bcache: Add btree_insert_node() The flow of control in the old btree insertion code was rather - backwards; we'd recurse down the btree (in btree_insert_recurse()), and then if we needed to split the keys to be inserted into the parent node would be effectively returned up to btree_insert_recurse(), which would notice there was more work to do and finish the insertion. The main problem with this was that the full logic for btree insertion could only be used by calling btree_insert_recurse; if you'd gotten to a btree leaf some other way and had a key to insert, if it turned out that node needed to be split you were SOL. This inverts the flow of control so btree_insert_node() does _full_ btree insertion, including splitting - and takes a (leaf) btree node to insert into as a parameter. This means we can now _correctly_ handle cache misses - for cache misses, we need to insert a fake "check" key into the btree when we discover we have a cache miss - while we still have the btree locked. Previously, if the btree node was full inserting a cache miss would just fail. Signed-off-by: Kent Overstreet <kmo@daterainc.com>	2013-11-10 21:55:57 -08:00
Kent Overstreet	d6fd3b11ce	bcache: Explicitly track btree node's parent This is prep work for the reworked btree insertion code. The way we set b->parent is ugly and hacky... the problem is, when btree_split() or garbage collection splits or rewrites a btree node, the parent changes for all its (potentially already cached) children. I may change this later and add some code to look through the btree node cache and find all our cached child nodes and change the parent pointer then... Signed-off-by: Kent Overstreet <kmo@daterainc.com>	2013-11-10 21:55:57 -08:00
Kent Overstreet	8304ad4dc8	bcache: Remove unnecessary check in should_split() Checking i->seq was redundant, because since ages ago we always initialize the new bset when advancing b->written Signed-off-by: Kent Overstreet <kmo@daterainc.com>	2013-11-10 21:55:56 -08:00
Kent Overstreet	2d679fc756	bcache: Stripe size isn't necessarily a power of two Originally I got this right... except that the divides didn't use do_div(), which broke 32 bit kernels. When I went to fix that, I forgot that the raid stripe size usually isn't a power of two... doh Signed-off-by: Kent Overstreet <kmo@daterainc.com>	2013-11-10 21:55:55 -08:00
Kent Overstreet	77c320eb46	bcache: Add on error panic/unregister setting Works kind of like the ext4 setting, to panic or remount read only on errors. Signed-off-by: Kent Overstreet <kmo@daterainc.com>	2013-11-10 21:55:55 -08:00
Kent Overstreet	49b1212dfa	bcache: Use blkdev_issue_discard() The old asynchronous discard code was really a relic from when all the allocation code was asynchronous - now that allocation runs out of a dedicated thread there's no point in keeping around all that complicated machinery. Signed-off-by: Kent Overstreet <kmo@daterainc.com>	2013-11-10 21:55:54 -08:00
Kent Overstreet	dd9ec84da5	bcache: Fix a lockdep splat bch_keybuf_del() takes a spinlock that can't be taken in interrupt context - whoops. Fortunately, this code isn't enabled by default (you have to toggle a sysfs thing). Signed-off-by: Kent Overstreet <kmo@daterainc.com>	2013-11-10 21:55:54 -08:00
Kent Overstreet	7857d5d470	bcache: Fix a journalling performance bug	2013-11-10 21:55:53 -08:00
Kent Overstreet	1fa8455deb	bcache: Fix dirty_data accounting Dirty data accounting wasn't quite right - firstly, we were adding the key we're inserting after it could have merged with another dirty key already in the btree, and secondly we could sometimes pass the wrong offset to bcache_dev_sectors_dirty_add() for dirty data we were overwriting - which is important when tracking dirty data by stripe. Signed-off-by: Kent Overstreet <kmo@daterainc.com> Cc: linux-stable <stable@vger.kernel.org> # >= v3.10	2013-11-10 21:55:27 -08:00
Joe Thornber	c9d28d5d09	dm cache: promotion optimisation for writes If a write block triggers promotion and covers a whole block we can avoid a copy. Introduce dm_{hook,unhook}_bio to simplify saving and restoring bio fields (bi_private is now used by overwrite). Switch writethrough support over to using these helpers too. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2013-11-09 18:20:26 -05:00
Joe Thornber	c86c30706c	dm cache: be much more aggressive about promoting writes to discarded blocks Previously these promotions only got priority if there were unused cache blocks. Now we give them priority if there are any clean blocks in the cache. The fio_soak_test in the device-mapper-test-suite now gives uniform performance across subvolumes (~16 seconds). Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2013-11-09 18:20:25 -05:00
Joe Thornber	01911c19be	dm cache policy mq: implement writeback_work() and mq_{set,clear}_dirty() There are now two multiqueues for in cache blocks. A clean one and a dirty one. writeback_work comes from the dirty one. Demotions come from the clean one. There are two benefits: - Performance improvement, since demoting a clean block is a noop. - The cache cleans itself when io load is light. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2013-11-09 18:20:25 -05:00
Heinz Mauelshagen	ffcbcb6720	dm cache: optimize commit_if_needed Check commit_requested flag _before_ calling dm_cache_changed_this_transaction() superfluously. Also, be sure to set last_commit_jiffies _after_ dm_cache_commit() completes. Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2013-11-09 18:20:24 -05:00
Joe Thornber	40c57f475f	dm space map disk: optimise sm_disk_dec_block Don't waste time spotting blocks that have been allocated and then freed in the same transaction. The extra lookup is expensive, and I don't think it really gives us much. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2013-11-09 18:20:24 -05:00
Mikulas Patocka	5442851edb	dm: fix Kconfig menu indentation The option DM_LOG_USERSPACE is sub-option of DM_MIRROR, so place it right after DM_MIRROR. Doing so fixes various other Device mapper targets/features to be properly nested under "Device mapper support". Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2013-11-09 18:20:22 -05:00
Mikulas Patocka	2c140a246d	dm: allow remove to be deferred This patch allows the removal of an open device to be deferred until it is closed. (Previously such a removal attempt would fail.) The deferred remove functionality is enabled by setting the flag DM_DEFERRED_REMOVE in the ioctl structure on DM_DEV_REMOVE or DM_REMOVE_ALL ioctl. On return from DM_DEV_REMOVE, the flag DM_DEFERRED_REMOVE indicates if the device was removed immediately or flagged to be removed on close - if the flag is clear, the device was removed. On return from DM_DEV_STATUS and other ioctls, the flag DM_DEFERRED_REMOVE is set if the device is scheduled to be removed on closure. A device that is scheduled to be deleted can be revived using the message "@cancel_deferred_remove". This message clears the DMF_DEFERRED_REMOVE flag so that the device won't be deleted on close. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2013-11-09 18:20:22 -05:00
Mike Snitzer	7833b08e18	dm table: print error on preresume failure If preresume fails it is worth logging an error given that a device is left suspended due to the failure. This change was motivated by local preresume error logging that was added to the cache target ("preresume failed"). Elevating this target-agnostic context for the where the target-specific error occurred relative to the DM core's callouts makes sense. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Joe Thornber <ejt@redhat.com>	2013-11-09 18:20:21 -05:00

... 3 4 5 6 7 ...

3154 Commits