linux

iv/linux

Author	SHA1	Message	Date
Guoqing Jiang	c84400c89f	md-cluster/bitmap: unplug bitmap to sync dirty pages to disk This patch is doing two distinct but related things. 1. It adds bitmap_unplug() for the main bitmap (mddev->bitmap). As bit have been set, BITMAP_PAGE_DIRTY is set so bitmap_deamon_work() will not write those pages out in its regular scans, only bitmap_unplug() will. If there are no writes to the array, bitmap_unplug() won't be called, so we need to call it explicitly here. 2. bitmap_write_all() is a bit of a confusing interface as it doesn't actually write anything. The current code for writing "bitmap" works but this change makes it a bit clearer. Reviewed-by: NeilBrown <neilb@suse.com> Signed-off-by: Guoqing Jiang <gqjiang@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>	2016-05-04 12:39:35 -07:00
Guoqing Jiang	23cea66a37	md-cluster/bitmap: fix wrong page num in bitmap_file_clear_bit and bitmap_file_set_bit The pnum passed to set_page_attr and test_page_attr should from 0 to storage.file_pages - 1, but bitmap_file_set_bit and bitmap_file_clear_bit call set_page_attr and test_page_attr with page->index parameter while page->index has already added node_offset before. So we need to minus node_offset in both bitmap_file_clear_bit and bitmap_file_set_bit. Reviewed-by: NeilBrown <neilb@suse.com> Signed-off-by: Guoqing Jiang <gqjiang@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>	2016-05-04 12:39:35 -07:00
Guoqing Jiang	7f86ffed9b	md-cluster/bitmap: fix wrong calcuation of offset The offset is wrong in bitmap_storage_alloc, we should set it like below in bitmap_init_from_disk(). node_offset = bitmap->cluster_slot * (DIV_ROUND_UP(store->bytes, PAGE_SIZE)); Because 'offset' is only assigned to 'page->index' and that is usually over-written by read_sb_page. So it does not cause problem in general, but it still need to be fixed. Reviewed-by: NeilBrown <neilb@suse.com> Signed-off-by: Guoqing Jiang <gqjiang@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>	2016-05-04 12:39:35 -07:00
Guoqing Jiang	18c9ff7f48	md-cluster: sync bitmap when node received RESYNCING msg If the node received RESYNCING message which means another node will perform resync with the area, then we don't want to do it again in another node. Let's set RESYNC_MASK and clear NEEDED_MASK for the region from old-low to new-low which has finished syncing, and the region from old-hi to new-hi is about to syncing, bitmap_sync_with_cluste is introduced for the purpose. Reviewed-by: NeilBrown <neilb@suse.com> Signed-off-by: Guoqing Jiang <gqjiang@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>	2016-05-04 12:39:35 -07:00
Guoqing Jiang	c9d6503228	md-cluster: always setup in-memory bitmap The in-memory bitmap for raid is allocated on demand, then for cluster scenario, it is possible that slave node which received RESYNCING message doesn't have the in-memory bitmap when master node is perform resyncing, so we can't make bitmap is match up well among each nodes. So for cluster scenario, we need always preserve the bitmap, and ensure the page will not be freed. And a no_hijack flag is introduced to both bitmap_checkpage and bitmap_get_counter, which makes cluster raid returns fail once allocate failed. And the next patch is relied on this change since it keeps sync bitmap among each nodes during resyncing stage. Reviewed-by: NeilBrown <neilb@suse.com> Signed-off-by: Guoqing Jiang <gqjiang@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>	2016-05-04 12:39:35 -07:00
Guoqing Jiang	a578183ed9	md-cluster: wakeup thread if activated a spare disk When a device is re-added, it will ultimately need to be activated and that happens in md_check_recovery, so we need to set MD_RECOVERY_NEEDED right after remove_and_add_spares. A specifical issue without the change is that when one node perform fail/remove/readd on a disk, but slave nodes could not add the disk back to array as expected (added as missed instead of in sync). So give slave nodes a chance to do resync. Reviewed-by: NeilBrown <neilb@suse.com> Signed-off-by: Guoqing Jiang <gqjiang@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>	2016-05-04 12:39:35 -07:00
Guoqing Jiang	ab5a98b132	md-cluster: change array_sectors and update size are not supported Currently, some features are not supported yet, such as change array_sectors and update size, so return EINVAL for them and listed it in document. Reviewed-by: NeilBrown <neilb@suse.com> Signed-off-by: Guoqing Jiang <gqjiang@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>	2016-05-04 12:39:35 -07:00
Guoqing Jiang	1535212c54	md-cluster: fix locking when node joins cluster during message broadcast If a node joins the cluster while a message broadcast is under way, a lock issue could happen as follows. For a cluster which included two nodes, if node A is calling __sendmsg before up-convert CR to EX on ack, and node B released CR on ack. But if a new node C joins the cluster and it doesn't receive the message which A sent before, so it could hold CR on ack before A up-convert CR to EX on ack. So a node joining the cluster should get an EX lock on the "token" first to ensure no broadcast is ongoing, then release it after held CR on ack. Reviewed-by: NeilBrown <neilb@suse.com> Signed-off-by: Guoqing Jiang <gqjiang@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>	2016-05-04 12:39:35 -07:00
Guoqing Jiang	5b0fb33e8a	md-cluster: unregister thread if err happened The two threads need to be unregistered if a node can't join cluster successfully. Reviewed-by: NeilBrown <neilb@suse.com> Signed-off-by: Guoqing Jiang <gqjiang@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>	2016-05-04 12:39:35 -07:00
Guoqing Jiang	eb315cd093	md-cluster: wake up thread to continue recovery In recovery case, we need to set MD_RECOVERY_NEEDED and wake up thread only if recover is not finished. Reviewed-by: NeilBrown <neilb@suse.com> Signed-off-by: Guoqing Jiang <gqjiang@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>	2016-05-04 12:39:35 -07:00
Guoqing Jiang	2c97cf1385	md-cluser: make resync_finish only called after pers->sync_request It is not reasonable that cluster raid to release resync lock before the last pers->sync_request has finished. As the metadata will be changed when node performs resync, we need to inform other nodes to update metadata, so the MD_CHANGE_PENDING flag is set before finish resync. Then metadata_update_finish is move ahead to ensure that METADATA_UPDATED msg is sent before finish resync, and metadata_update_start need to be run after "repeat:" label accordingly. Reviewed-by: NeilBrown <neilb@suse.com> Signed-off-by: Guoqing Jiang <gqjiang@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>	2016-05-04 12:39:35 -07:00
Guoqing Jiang	41a9a0dcf8	md-cluster: change resync lock from asynchronous to synchronous If multiple nodes choose to attempt do resync at the same time they need to be serialized so they don't duplicate effort. This serialization is done by locking the 'resync' DLM lock. Currently if a node cannot get the lock immediately it doesn't request notification when the lock becomes available (i.e. DLM_LKF_NOQUEUE is set), so it may not reliably find out when it is safe to try again. Rather than trying to arrange an async wake-up when the lock becomes available, switch to using synchronous locking - this is a lot easier to think about. As it is not permitted to block in the 'raid1d' thread, move the locking to the resync thread. So the rsync thread is forked immediately, but it blocks until the resync lock is available. Once the lock is locked it checks again if any resync action is needed. A particular symptom of the current problem is that a node can get stuck with "resync=pending" indefinitely. Reviewed-by: NeilBrown <neilb@suse.com> Signed-off-by: Guoqing Jiang <gqjiang@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>	2016-05-04 12:39:35 -07:00
Linus Torvalds	98bcf28636	Merge tag 'md/4.6-rc6-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md Pull MD fixes from Shaohua Li: "This update includes several trival fixes. The only important one is to fix MD bio merge, which has big performance impact" * tag 'md/4.6-rc6-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md: raid5: delete unnecessary warnning MD: make bio mergeable md/raid0: remove empty line printk from dump_zones md/raid0: fix uninitialized variable bug	2016-05-02 12:22:51 -07:00
Shaohua Li	b8a0b8e946	raid5: delete unnecessary warnning If device has R5_LOCKED set, it's legit device has R5_SkipCopy set and page != orig_page. After R5_LOCKED is clear, handle_stripe_clean_event will clear the SkipCopy flag and set page to orig_page. So the warning is unnecessary. Reported-by: Joey Liao <joeyliao@qnap.com> Signed-off-by: Shaohua Li <shli@fb.com>	2016-04-29 14:18:03 -07:00
Shaohua Li	9c573de328	MD: make bio mergeable blk_queue_split marks bio unmergeable, which makes sense for normal bio. But if dispatching the bio to underlayer disk, the blk_queue_split checks are invalid, hence it's possible the bio becomes mergeable. In the reported bug, this bug causes trim against raid0 performance slash https://bugzilla.kernel.org/show_bug.cgi?id=117051 Reported-and-tested-by: Park Ju Hyung <qkrwngud825@gmail.com> Fixes: 6ac45aeb6bca(block: avoid to merge splitted bio) Cc: stable@vger.kernel.org (v4.3+) Cc: Ming Lei <ming.lei@canonical.com> Cc: Neil Brown <neilb@suse.de> Reviewed-by: Jens Axboe <axboe@fb.com> Signed-off-by: Shaohua Li <shli@fb.com>	2016-04-25 18:21:33 -07:00
Michał Pecio	b297874a2d	md/raid0: remove empty line printk from dump_zones Remove the final printk. All preceding output is already properly newline-terminated and the printk isn't even KERN_CONT to begin with, so it only adds one empty line to the log. Signed-off-by: Michal Pecio <michal.pecio@gmail.com> Signed-off-by: Shaohua Li <shli@fb.com>	2016-04-25 08:43:58 -07:00
Ahmed Samy	6545b60baa	dm cache metadata: fix cmd_read_lock() acquiring write lock Commit `9567366fef` ("dm cache metadata: fix READ_LOCK macros and cleanup WRITE_LOCK macros") uses down_write() instead of down_read() in cmd_read_lock(), yet up_read() is used to release the lock in READ_UNLOCK(). Fix it. Fixes: `9567366fef` ("dm cache metadata: fix READ_LOCK macros and cleanup WRITE_LOCK macros") Cc: stable@vger.kernel.org Signed-off-by: Ahmed Samy <f.fallen45@gmail.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2016-04-17 11:24:46 -04:00
Mike Snitzer	9567366fef	dm cache metadata: fix READ_LOCK macros and cleanup WRITE_LOCK macros The READ_LOCK macro was incorrectly returning -EINVAL if dm_bm_is_read_only() was true -- it will always be true once the cache metadata transitions to read-only by dm_cache_metadata_set_read_only(). Wrap READ_LOCK and WRITE_LOCK multi-statement macros in do {} while(0). Also, all accesses of the 'cmd' argument passed to these related macros are now encapsulated in parenthesis. A follow-up patch can be developed to eliminate the use of macros in favor of pure C code. Avoiding that now given that this needs to apply to stable@. Reported-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Fixes: `d14fcf3dd7` ("dm cache: make sure every metadata function checks fail_io") Cc: stable@vger.kernel.org	2016-04-14 17:34:49 -04:00
Dan Carpenter	7dedd15dd2	md/raid0: fix uninitialized variable bug If this function fails the callers expect that *private_conf is set to an ERR_PTR() but that isn't true for the first error path where we can't allocate "conf". It leads to some uninitialized variable bugs. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Shaohua Li <shli@fb.com>	2016-04-14 09:57:59 -07:00
Mikulas Patocka	072623de1f	dm: fix dm_target_io leak if clone_bio() returns an error Commit `c80914e81e` ("dm: return error if bio_integrity_clone() fails in clone_bio()") changed clone_bio() such that if it does return error then the alloc_tio() created resources (both the bio that was allocated to be a clone and the containing dm_target_io struct) will leak. Fix this by calling free_tio() in __clone_and_map_data_bio()'s clone_bio() error path. Fixes: `c80914e81e` ("dm: return error if bio_integrity_clone() fails in clone_bio()") Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2016-04-11 11:49:09 -04:00
Linus Torvalds	63b106a87d	Merge tag 'md/4.6-rc2-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md Pull MD fixes from Shaohua Li: "This update mainly fixes bugs: - fix error handling (Guoqing) - fix a crash when a disk is hotremoved (me) - fix a dead loop (Wei Fang)" * tag 'md/4.6-rc2-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md: md/bitmap: clear bitmap if bitmap_create failed MD: add rdev reference for super write md: fix a trivial typo in comments md:raid1: fix a dead loop when read from a WriteMostly disk	2016-04-09 11:23:27 -07:00
Kirill A. Shutemov	09cbfeaf1a	mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced long time ago with promise that one day it will be possible to implement page cache with bigger chunks than PAGE_SIZE. This promise never materialized. And unlikely will. We have many places where PAGE_CACHE_SIZE assumed to be equal to PAGE_SIZE. And it's constant source of confusion on whether PAGE_CACHE_* or PAGE_* constant should be used in a particular case, especially on the border between fs and mm. Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much breakage to be doable. Let's stop pretending that pages in page cache are special. They are not. The changes are pretty straight-forward: - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>; - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>; - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN}; - page_cache_get() -> get_page(); - page_cache_release() -> put_page(); This patch contains automated changes generated with coccinelle using script below. For some reason, coccinelle doesn't patch header files. I've called spatch for them manually. The only adjustment after coccinelle is revert of changes to PAGE_CAHCE_ALIGN definition: we are going to drop it later. There are few places in the code where coccinelle didn't reach. I'll fix them manually in a separate patch. Comments and documentation also will be addressed with the separate patch. virtual patch @@ expression E; @@ - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT) + E @@ expression E; @@ - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) + E @@ @@ - PAGE_CACHE_SHIFT + PAGE_SHIFT @@ @@ - PAGE_CACHE_SIZE + PAGE_SIZE @@ @@ - PAGE_CACHE_MASK + PAGE_MASK @@ expression E; @@ - PAGE_CACHE_ALIGN(E) + PAGE_ALIGN(E) @@ expression E; @@ - page_cache_get(E) + get_page(E) @@ expression E; @@ - page_cache_release(E) + put_page(E) Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-04-04 10:41:08 -07:00
Guoqing Jiang	f9a67b1182	md/bitmap: clear bitmap if bitmap_create failed If bitmap_create returns an error, we need to call either bitmap_destroy or bitmap_free to do clean up, and the selection is based on mddev->bitmap is set or not. And the sysfs_put(bitmap->sysfs_can_clear) is moved from bitmap_destroy to bitmap_free, and the comment of bitmap_create is changed as well. Signed-off-by: Guoqing Jiang <gqjiang@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>	2016-04-01 13:05:50 -07:00
Shaohua Li	ed3b98c71c	MD: add rdev reference for super write Xiao Ni reported below crash: [26396.335146] BUG: unable to handle kernel NULL pointer dereference at 00000000000002a8 [26396.342990] IP: [<ffffffffa0425b00>] super_written+0x20/0x80 [md_mod] [26396.349449] PGD 0 [26396.351468] Oops: 0002 [#1] SMP [26396.354898] Modules linked in: ext4 mbcache jbd2 raid456 async_raid6_recov async_memcpy async_pq async_xor xor async_td [26396.408404] CPU: 5 PID: 3261 Comm: loop0 Not tainted 4.5.0 #1 [26396.414140] Hardware name: Dell Inc. PowerEdge R715/0G2DP3, BIOS 3.2.2 09/15/2014 [26396.421608] task: ffff8808339be680 ti: ffff8808365f4000 task.ti: ffff8808365f4000 [26396.429074] RIP: 0010:[<ffffffffa0425b00>] [<ffffffffa0425b00>] super_written+0x20/0x80 [md_mod] [26396.437952] RSP: 0018:ffff8808365f7c38 EFLAGS: 00010046 [26396.443252] RAX: ffffffffa0425ae0 RBX: ffff8804336a7900 RCX: ffffe8f9f7b41198 [26396.450371] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8804336a7900 [26396.457489] RBP: ffff8808365f7c50 R08: 0000000000000005 R09: 00001801e02ce3d7 [26396.464608] R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000 [26396.471728] R13: ffff8808338d9a00 R14: 0000000000000000 R15: ffff880833f9fe00 [26396.478849] FS: 00007f9e5066d740(0000) GS:ffff880237b40000(0000) knlGS:0000000000000000 [26396.486922] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [26396.492656] CR2: 00000000000002a8 CR3: 00000000019ea000 CR4: 00000000000006e0 [26396.499775] Stack: [26396.501781] ffff8804336a7900 0000000000000000 0000000000000000 ffff8808365f7c68 [26396.509199] ffffffff81308cd0 ffff8804336a7900 ffff8808365f7ca8 ffffffff81310637 [26396.516618] 00000000a0233a00 ffff880833f9fe00 0000000000000000 ffff880833fb0000 [26396.524038] Call Trace: [26396.526485] [<ffffffff81308cd0>] bio_endio+0x40/0x60 [26396.531529] [<ffffffff81310637>] blk_update_request+0x87/0x320 [26396.537439] [<ffffffff8131a20a>] blk_mq_end_request+0x1a/0x70 [26396.543261] [<ffffffff81313889>] blk_flush_complete_seq+0xd9/0x2a0 [26396.549517] [<ffffffff81313ccf>] flush_end_io+0x15f/0x240 [26396.554993] [<ffffffff8131a22a>] blk_mq_end_request+0x3a/0x70 [26396.560815] [<ffffffff8131a314>] __blk_mq_complete_request+0xb4/0xe0 [26396.567246] [<ffffffff8131a35c>] blk_mq_complete_request+0x1c/0x20 [26396.573506] [<ffffffffa04182df>] loop_queue_work+0x6f/0x72c [loop] [26396.579764] [<ffffffff81697844>] ? __schedule+0x2b4/0x8f0 [26396.585242] [<ffffffff810a7812>] kthread_worker_fn+0x52/0x170 [26396.591065] [<ffffffff810a77c0>] ? kthread_create_on_node+0x1a0/0x1a0 [26396.597582] [<ffffffff810a7238>] kthread+0xd8/0xf0 [26396.602453] [<ffffffff810a7160>] ? kthread_park+0x60/0x60 [26396.607929] [<ffffffff8169bdcf>] ret_from_fork+0x3f/0x70 [26396.613319] [<ffffffff810a7160>] ? kthread_park+0x60/0x60 md_super_write() and corresponding md_super_wait() generally are called with reconfig_mutex locked, which prevents disk disappears. There is one case this rule is broken. write_sb_page of bitmap.c doesn't hold the mutex. next_active_rdev does increase rdev reference, but it decreases the reference too early (eg, before IO finish). disk can disappear at the window. We unconditionally increase rdev reference in md_super_write() to avoid the race. Reported-and-tested-by: Xiao Ni <xni@redhat.com> Reviewed-by: Neil Brown <neilb@suse.de> Signed-off-by: Shaohua Li <shli@fb.com>	2016-03-31 10:04:18 -07:00
Wei Fang	466ad29223	md: fix a trivial typo in comments Fix a trivial typo in md_ioctl(). Signed-off-by: Wei Fang <fangwei1@huawei.com> Signed-off-by: Shaohua Li <shli@fb.com>	2016-03-31 10:04:18 -07:00
Wei Fang	816b0acf3d	md:raid1: fix a dead loop when read from a WriteMostly disk If first_bad == this_sector when we get the WriteMostly disk in read_balance(), valid disk will be returned with zero max_sectors. It'll lead to a dead loop in make_request(), and OOM will happen because of endless allocation of struct bio. Since we can't get data from this disk in this case, so continue for another disk. Signed-off-by: Wei Fang <fangwei1@huawei.com> Signed-off-by: Shaohua Li <shli@fb.com>	2016-03-31 10:04:17 -07:00
Linus Torvalds	4526b710c1	Merge tag 'md/4.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md Pull MD updates from Shaohua Li: "This update mainly fixes bugs. - a raid5 discard related fix from Jes - a MD multipath bio clone fix from Ming - raid1 error handling deadlock fix from Nate and corresponding raid10 fix from myself - a raid5 stripe batch fix from Neil - a patch from Sebastian to avoid unnecessary uevent - several cleanup/debug patches" * tag 'md/4.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md: md/raid5: Cleanup cpu hotplug notifier raid10: include bio_end_io_list in nr_queued to prevent freeze_array hang raid1: include bio_end_io_list in nr_queued to prevent freeze_array hang md: fix typos for stipe md/bitmap: remove redundant return in bitmap_checkpage md/raid1: remove unnecessary BUG_ON md: multipath: don't hardcopy bio in .make_request path md/raid5: output stripe state for debug md/raid5: preserve STRIPE_PREREAD_ACTIVE in break_stripe_batch_list Update MD git tree URL md/bitmap: remove redundant check MD: warn for potential deadlock md: Drop sending a change uevent when stopping RAID5: revert `e9e4c377e2` to fix a livelock RAID5: check_reshape() shouldn't call mddev_suspend md/raid5: Compare apples to apples (or sectors to sectors)	2016-03-21 14:18:10 -07:00
Linus Torvalds	237045fc3c	Merge branch 'for-4.6/drivers' of git://git.kernel.dk/linux-block Pull block driver updates from Jens Axboe: "This is the block driver pull request for this merge window. It sits on top of for-4.6/core, that was just sent out. This contains: - A set of fixes for lightnvm. One from Alan, fixing an overflow, and the rest from the usual suspects, Javier and Matias. - A set of fixes for nbd from Markus and Dan, and a fixup from Arnd for correct usage of the signed 64-bit divider. - A set of bug fixes for the Micron mtip32xx, from Asai. - A fix for the brd discard handling from Bart. - Update the maintainers entry for cciss, since that hardware has transferred ownership. - Three bug fixes for bcache from Eric Wheeler. - Set of fixes for xen-blk{back,front} from Jan and Konrad. - Removal of the cpqarray driver. It has been disabled in Kconfig since 2013, and we were initially scheduled to remove it in 3.15. - Various updates and fixes for NVMe, with the most important being: - Removal of the per-device NVMe thread, replacing that with a watchdog timer instead. From Christoph. - Exposing the namespace WWID through sysfs, from Keith. - Set of cleanups from Ming Lin. - Logging the controller device name instead of the underlying PCI device name, from Sagi. - And a bunch of fixes and optimizations from the usual suspects in this area" * 'for-4.6/drivers' of git://git.kernel.dk/linux-block: (49 commits) NVMe: Expose ns wwid through single sysfs entry drivers:block: cpqarray clean up brd: Fix discard request processing cpqarray: remove it from the kernel cciss: update MAINTAINERS NVMe: Remove unused sq_head read in completion path bcache: fix cache_set_flush() NULL pointer dereference on OOM bcache: cleaned up error handling around register_cache() bcache: fix race of writeback thread starting before complete initialization NVMe: Create discard zero quirk white list nbd: use correct div_s64 helper mtip32xx: remove unneeded variable in mtip_cmd_timeout() lightnvm: generalize rrpc ppa calculations lightnvm: remove struct nvm_dev->total_blocks lightnvm: rename ->nr_pages to ->nr_sects lightnvm: update closed list outside of intr context xen/blback: Fit the important information of the thread in 17 characters lightnvm: fold get bb tbl when using dual/quad plane mode lightnvm: fix up nonsensical configure overrun checking xen-blkback: advertise indirect segment support earlier ...	2016-03-18 17:13:31 -07:00
Anna-Maria Gleixner	1d034e68e2	md/raid5: Cleanup cpu hotplug notifier The raid456_cpu_notify() hotplug callback lacks handling of the CPU_UP_CANCELED case. That means if CPU_UP_PREPARE fails, the scratch buffer is leaked. Add handling for CPU_UP_CANCELED[_FROZEN] hotplug notifier transitions to free the scratch buffer. CC: Shaohua Li <shli@kernel.org> CC: linux-raid@vger.kernel.org Signed-off-by: Anna-Maria Gleixner <anna-maria@linutronix.de> Signed-off-by: Shaohua Li <shli@fb.com>	2016-03-17 14:30:15 -07:00
Shaohua Li	23ddba80eb	raid10: include bio_end_io_list in nr_queued to prevent freeze_array hang This is the raid10 counterpart of the bug fixed by Nate (raid1: include bio_end_io_list in nr_queued to prevent freeze_array hang) Fixes: 95af587e95(md/raid10: ensure device failure recorded before write request returns) Cc: stable@vger.kernel.org (V4.3+) Cc: Nate Dailey <nate.dailey@stratus.com> Signed-off-by: Shaohua Li <shli@fb.com>	2016-03-17 14:27:01 -07:00
Nate Dailey	ccfc7bf1f0	raid1: include bio_end_io_list in nr_queued to prevent freeze_array hang If raid1d is handling a mix of read and write errors, handle_read_error's call to freeze_array can get stuck. This can happen because, though the bio_end_io_list is initially drained, writes can be added to it via handle_write_finished as the retry_list is processed. These writes contribute to nr_pending but are not included in nr_queued. If a later entry on the retry_list triggers a call to handle_read_error, freeze array hangs waiting for nr_pending == nr_queued+extra. The writes on the bio_end_io_list aren't included in nr_queued so the condition will never be satisfied. To prevent the hang, include bio_end_io_list writes in nr_queued. There's probably a better way to handle decrementing nr_queued, but this seemed like the safest way to avoid breaking surrounding code. I'm happy to supply the script I used to repro this hang. Fixes: 55ce74d4bfe1b(md/raid1: ensure device failure recorded before write request returns.) Cc: stable@vger.kernel.org (v4.3+) Signed-off-by: Nate Dailey <nate.dailey@stratus.com> Signed-off-by: Shaohua Li <shli@fb.com>	2016-03-17 14:24:51 -07:00
Linus Torvalds	70477371dc	Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto update from Herbert Xu: "Here is the crypto update for 4.6: API: - Convert remaining crypto_hash users to shash or ahash, also convert blkcipher/ablkcipher users to skcipher. - Remove crypto_hash interface. - Remove crypto_pcomp interface. - Add crypto engine for async cipher drivers. - Add akcipher documentation. - Add skcipher documentation. Algorithms: - Rename crypto/crc32 to avoid name clash with lib/crc32. - Fix bug in keywrap where we zero the wrong pointer. Drivers: - Support T5/M5, T7/M7 SPARC CPUs in n2 hwrng driver. - Add PIC32 hwrng driver. - Support BCM6368 in bcm63xx hwrng driver. - Pack structs for 32-bit compat users in qat. - Use crypto engine in omap-aes. - Add support for sama5d2x SoCs in atmel-sha. - Make atmel-sha available again. - Make sahara hashing available again. - Make ccp hashing available again. - Make sha1-mb available again. - Add support for multiple devices in ccp. - Improve DMA performance in caam. - Add hashing support to rockchip" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (116 commits) crypto: qat - remove redundant arbiter configuration crypto: ux500 - fix checks of error code returned by devm_ioremap_resource() crypto: atmel - fix checks of error code returned by devm_ioremap_resource() crypto: qat - Change the definition of icp_qat_uof_regtype hwrng: exynos - use __maybe_unused to hide pm functions crypto: ccp - Add abstraction for device-specific calls crypto: ccp - CCP versioning support crypto: ccp - Support for multiple CCPs crypto: ccp - Remove check for x86 family and model crypto: ccp - memset request context to zero during import lib/mpi: use "static inline" instead of "extern inline" lib/mpi: avoid assembler warning hwrng: bcm63xx - fix non device tree compatibility crypto: testmgr - allow rfc3686 aes-ctr variants in fips mode. crypto: qat - The AE id should be less than the maximal AE number lib/mpi: Endianness fix crypto: rockchip - add hash support for crypto engine in rk3288 crypto: xts - fix compile errors crypto: doc - add skcipher API documentation crypto: doc - update AEAD AD handling ...	2016-03-17 11:22:54 -07:00
Bryn M. Reeves	98dbc9c6c6	dm: fix rq_end_stats() NULL pointer in dm_requeue_original_request() An "old" (.request_fn) DM 'struct request' stores a pointer to the associated 'struct dm_rq_target_io' in rq->special. dm_requeue_original_request(), previously named dm_requeue_unmapped_original_request(), called dm_unprep_request() to reset rq->special to NULL. But rq_end_stats() would go on to hit a NULL pointer deference because its call to tio_from_request() returned NULL. Fix this by calling rq_end_stats() _before_ dm_unprep_request() Signed-off-by: Bryn M. Reeves <bmr@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Fixes: `e262f34741` ("dm stats: add support for request-based DM devices") Cc: stable@vger.kernel.org # 4.2+	2016-03-14 17:04:34 -04:00
Guoqing Jiang	d85326cf86	md: fix typos for stipe Signed-off-by: Guoqing Jiang <gqjiang@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>	2016-03-14 11:36:10 -07:00
Guoqing Jiang	c6f0b9f195	md/bitmap: remove redundant return in bitmap_checkpage The "return 0" is not needed since bitmap_checkpage will finally return 0 for the case. Signed-off-by: Guoqing Jiang <gqjiang@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>	2016-03-14 11:36:07 -07:00
Guoqing Jiang	b3c95b425e	md/raid1: remove unnecessary BUG_ON Since bitmap_start_sync will not return until sync_blocks is not less than PAGE_SIZE>>9, so the BUG_ON is not needed anymore. Signed-off-by: Guoqing Jiang <gqjiang@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>	2016-03-14 11:35:58 -07:00
Ming Lei	fafcde3ac1	md: multipath: don't hardcopy bio in .make_request path Inside multipath_make_request(), multipath maps the incoming bio into low level device's bio, but it is totally wrong to copy the bio into mapped bio via 'mapped_bio = bio'. For example, .__bi_remaining is kept in the copy, especially if the incoming bio is chained to via bio splitting, so .bi_end_io can't be called for the mapped bio at all in the completing path in this kind of situation. This patch fixes the issue by using clone style. Cc: stable@vger.kernel.org (v3.14+) Reported-and-tested-by: Andrea Righi <righi.andrea@gmail.com> Signed-off-by: Ming Lei <ming.lei@canonical.com> Signed-off-by: Shaohua Li <shli@fb.com>	2016-03-14 11:32:26 -07:00
Mike Snitzer	c3667cc619	dm thin: consistently return -ENOSPC if pool has run out of data space Commit `0a927c2f02` ("dm thin: return -ENOSPC when erroring retry list due to out of data space") was a step in the right direction but didn't go far enough. Add a new 'out_of_data_space' flag to 'struct pool' and set it if/when the pool runs of of data space. This fixes cell_error() and error_retry_list() to not blindly return -EIO. We cannot rely on the 'error_if_no_space' feature flag since it is transient (in that it can be reset once space is added, plus it only controls whether errors are issued, it doesn't reflect whether the pool is actually out of space). Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2016-03-11 16:15:22 -05:00
Mike Snitzer	843f0f2e8f	dm cache: bump the target version Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2016-03-10 17:12:12 -05:00
Joe Thornber	d14fcf3dd7	dm cache: make sure every metadata function checks fail_io Otherwise operations may be attempted that will only ever go on to crash (since the metadata device is either missing or unreliable if 'fail_io' is set). Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org	2016-03-10 17:12:12 -05:00
Mike Snitzer	3f0680402c	dm: add missing newline between DM_DEBUG_BLOCK_STACK_TRACING and DM_BUFIO Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2016-03-10 17:12:11 -05:00
Mike Snitzer	7dd85bb0e9	dm cache policy smq: clarify that mq registration failure was for 'mq' Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2016-03-10 17:12:11 -05:00
Mike Snitzer	c80914e81e	dm: return error if bio_integrity_clone() fails in clone_bio() clone_bio() now checks if bio_integrity_clone() returned an error rather than just drop it on the floor. Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2016-03-10 17:12:10 -05:00
Joe Thornber	2eae9e4489	dm thin metadata: don't issue prefetches if a transaction abort has failed If a transaction abort has failed then we can no longer use the metadata device. Typically this happens if the superblock is unreadable. This fix addresses a crash seen during metadata device failure testing. Fixes: `8a01a6af75` ("dm thin: prefetch missing metadata pages") Cc: stable@vger.kernel.org # 3.19+ Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2016-03-10 17:12:09 -05:00
DingXiang	4df2bf466a	dm snapshot: disallow the COW and origin devices from being identical Otherwise loading a "snapshot" table using the same device for the origin and COW devices, e.g.: echo "0 20971520 snapshot 253:3 253:3 P 8" \| dmsetup create snap will trigger: BUG: unable to handle kernel NULL pointer dereference at 0000000000000098 [ 1958.979934] IP: [<ffffffffa040efba>] dm_exception_store_set_chunk_size+0x7a/0x110 [dm_snapshot] [ 1958.989655] PGD 0 [ 1958.991903] Oops: 0000 [#1] SMP ... [ 1959.059647] CPU: 9 PID: 3556 Comm: dmsetup Tainted: G IO 4.5.0-rc5.snitm+ #150 ... [ 1959.083517] task: ffff8800b9660c80 ti: ffff88032a954000 task.ti: ffff88032a954000 [ 1959.091865] RIP: 0010:[<ffffffffa040efba>] [<ffffffffa040efba>] dm_exception_store_set_chunk_size+0x7a/0x110 [dm_snapshot] [ 1959.104295] RSP: 0018:ffff88032a957b30 EFLAGS: 00010246 [ 1959.110219] RAX: 0000000000000000 RBX: 0000000000000008 RCX: 0000000000000001 [ 1959.118180] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff880329334a00 [ 1959.126141] RBP: ffff88032a957b50 R08: 0000000000000000 R09: 0000000000000001 [ 1959.134102] R10: 000000000000000a R11: f000000000000000 R12: ffff880330884d80 [ 1959.142061] R13: 0000000000000008 R14: ffffc90001c13088 R15: ffff880330884d80 [ 1959.150021] FS: 00007f8926ba3840(0000) GS:ffff880333440000(0000) knlGS:0000000000000000 [ 1959.159047] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1959.165456] CR2: 0000000000000098 CR3: 000000032f48b000 CR4: 00000000000006e0 [ 1959.173415] Stack: [ 1959.175656] ffffc90001c13040 ffff880329334a00 ffff880330884ed0 ffff88032a957bdc [ 1959.183946] ffff88032a957bb8 ffffffffa040f225 ffff880329334a30 ffff880300000000 [ 1959.192233] ffffffffa04133e0 ffff880329334b30 0000000830884d58 00000000569c58cf [ 1959.200521] Call Trace: [ 1959.203248] [<ffffffffa040f225>] dm_exception_store_create+0x1d5/0x240 [dm_snapshot] [ 1959.211986] [<ffffffffa040d310>] snapshot_ctr+0x140/0x630 [dm_snapshot] [ 1959.219469] [<ffffffffa0005c44>] ? dm_split_args+0x64/0x150 [dm_mod] [ 1959.226656] [<ffffffffa0005ea7>] dm_table_add_target+0x177/0x440 [dm_mod] [ 1959.234328] [<ffffffffa0009203>] table_load+0x143/0x370 [dm_mod] [ 1959.241129] [<ffffffffa00090c0>] ? retrieve_status+0x1b0/0x1b0 [dm_mod] [ 1959.248607] [<ffffffffa0009e35>] ctl_ioctl+0x255/0x4d0 [dm_mod] [ 1959.255307] [<ffffffff813304e2>] ? memzero_explicit+0x12/0x20 [ 1959.261816] [<ffffffffa000a0c3>] dm_ctl_ioctl+0x13/0x20 [dm_mod] [ 1959.268615] [<ffffffff81215eb6>] do_vfs_ioctl+0xa6/0x5c0 [ 1959.274637] [<ffffffff81120d2f>] ? __audit_syscall_entry+0xaf/0x100 [ 1959.281726] [<ffffffff81003176>] ? do_audit_syscall_entry+0x66/0x70 [ 1959.288814] [<ffffffff81216449>] SyS_ioctl+0x79/0x90 [ 1959.294450] [<ffffffff8167e4ae>] entry_SYSCALL_64_fastpath+0x12/0x71 ... [ 1959.323277] RIP [<ffffffffa040efba>] dm_exception_store_set_chunk_size+0x7a/0x110 [dm_snapshot] [ 1959.333090] RSP <ffff88032a957b30> [ 1959.336978] CR2: 0000000000000098 [ 1959.344121] ---[ end trace b049991ccad1169e ]--- Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1195899 Cc: stable@vger.kernel.org Signed-off-by: Ding Xiang <dingxiang@huawei.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2016-03-10 17:12:09 -05:00
Joe Thornber	9ed84698fd	dm cache: make the 'mq' policy an alias for 'smq' smq seems to be performing better than the old mq policy in all situations, as well as using a quarter of the memory. Make 'mq' an alias for 'smq' when choosing a cache policy. The tunables that were present for the old mq are faked, and have no effect. mq should be considered deprecated now. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2016-03-10 17:12:08 -05:00
Bob Liu	e233d800a9	dm: drop unnecessary assignment of md->queue md->queue and q are the same thing in dm_old_init_request_queue() and dm_mq_init_request_queue(). Also drop the temporary 'struct request_queue *q' in dm_old_init_request_queue(). Signed-off-by: Bob Liu <bob.liu@oracle.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2016-03-10 17:12:07 -05:00
Mike Snitzer	032482fda4	dm: reorder 'struct mapped_device' members to fix alignment and holes Saves 16 bytes by eliminating 4 4byte holes but more importantly: numerous members that crossed cachelines were fixed. Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2016-03-10 17:12:07 -05:00
Mike Snitzer	1d3aa6f683	dm: remove dummy definition of 'struct dm_table' Change the map pointer in 'struct mapped_device' from 'struct dm_table __rcu ' to 'void __rcu ' to avoid the need for the dummy definition. Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2016-03-10 17:12:06 -05:00
Mike Snitzer	115485e83f	dm: add 'dm_numa_node' module parameter Allows user to control which NUMA node the memory for DM device structures (e.g. mapped_device, request_queue, gendisk, blk_mq_tag_set) is allocated from. Defaults to NUMA_NO_NODE (-1). Allowable range is from -1 until the last online NUMA node id. Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2016-03-10 17:12:06 -05:00

1 2 3 4 5 ...

4154 Commits